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SYNOPSIS 

The advent of modern high-speed computers has greatly 
enhanced the applicability of matrices in all the walks of 
science and engineering, for instance, in the stability of 
biological, physical and social systems and control theory. 

In fact, most of the engineering problems in order to get 
solved are to be discretized and the corresponding original 
questions lead to interesting matrix problems. Earlier, 
the bottleneck v/as the large size of the corresponding 
discretized systems and this now gets removed to a large 
extent as the faster and more versatile computer systems are 
coming to our aid. In view of this, matrix theory with the 
influx of varied new problems has now become all the more 
active and interesting area of research. 

The present thesis is concerned with three categories 
of problems in matrix theory. The first of these concerns 
the foundations and consists of the development of a notion 
of angularity which generalizes the well-lmown concept of 
inertia due to Ostrowski and Schneider, The second topic 
arises due to mathematical curiosity as to what are the 
most general linear transformations on matrices and vectors 
which possess certain natural invariants. The third and 
the last topic studied is application-oriented and deals 
with the iterative numerical solutions of the Lyapunov matrix 
equation AX+XA*=C. 



Indeed, the notion of inertia is immediately connected 
with the Tstahility of a linear system x=Ax and a way of 
determining which is to see whether the corresponding Lyapunov 
matrix equation mentioned above for a negative definite C 
leads to a positive definite solution matrix X, Thus the 
first and the third topics of the present thesis are 
interconnected. Since in the second topic, we also study 
certain inertia- and angularity-preserving transformations, 
the first two topics are as well related. 

The thesis consists of four chapters. A chapteirwise 
summary is as follows; 

Chapter 1 is introductory in nature and is intended to 
provide a reasonable up-to-date survey of the three areas 
closely related with our thesis material. These are inertia 
theory, linear transformations with invariants and the matrix 
equation AX+XB=C. 

In Chapter 2, we introduce and study a notion of 
angularity of a matrix as a generalization of the well-known 
concept of Inertia. The inertia In(A) of an nxn complex 
matrix A is defined as an ordered triple (tc(A), '^(A) , 5(A)), 
the entries denoting the total number of eigenvalues of A 
with positive, negative, and zero real parts respectively. 

The inertia In (A) thus depends on the distribution of 
arguments of eigenvalues of A. This dependence becomes 
complete in the case of angularity 0[A] of A. In order to 
define 0[A], we first define the ray space n of the complex 
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plene C felloe, 'i-og Cain as the set {{0}} U {e 3R_^s 0^9'^ 2%} ^ 
where iK_j_ = (0,°°) „ A general element of is called a ray 
and is denoted by 5 w = { 0 } being called a nu.ll ray and 
w = e 3R+ (0 _< 9 < 2ti) being a proper ray. Now the 
angularity 9 [a] of an nxn complex ma.trix A is defined as 
a mapping from n to the set of nontiegative integers for 
which 9 [a]^ is the number of eigenvalues of A (counting 
multiplicities) lying on the ray m . 

The results of this chapter are mostly related to 
angularity of normal matrices , Many angularity theorems 
are proved there and the main result says that if B and C 
are nonsingular matrices then B*AB and C*AC have the same 
angularity provided they are normal. Some well-known 
inertia theorems, for example, Sylvester's law of inertia 
have been deduced as corollaries of this main result. 

The case when C is permitted to be singular is discussed 
next. Then the quantitatively sharpened results of 
Ostrowski and those of Thompson on Sylvester's law have 
been extended to normal matrices. Further we define 
totally normal matrices as those having all their principal 
submatrices as normal and prove that a matrix of order 2. 5 
is totally normal if and only if all its second and third 
order principal submatrices are normal. Finally, by 
making use of the main result of this chapter, we prove 
an angularity result for partitioned normal matrices which 
states that if A is a normal matrix expressed in the 



of the original function such as nonnegetivity, monotonicity , 
number of os cillations (variations) , number of zeros etOo 
are tlien to be inferred from tho corresponding notions 
about the components of the vectors. From this point of 
view* it is natural to characterize matrix transformations 
which preserve such structural characteristics of the 
vectors. In this connection, we determine matrioes which 
preserve properties such as nonnegativity, variations of 
various order, number of zeros, nondecreasing trend etc. 
Finally we determine linear transformations on the set of 
circulaiits which preserve inertia and angularity. 

The last chapter is motivated by the works of Hoskins , 
Meek and V/alton on the iterative methods for the nmerical 
Solution of the Lyapunov matrix equation AX+XA*=C. Even 
though quite stable and economical direct methods such as the 
Bart els -Stewart algorithm are available for the problem, 
due to the surprisingly fast convergence (about 5 iterates) 
of some of the iterative methods these seem to deserve a 
more extensive mathematical analysis. In this context, 
we consider two such iterative procedures and establish 
their theoretical convergence for certain general classes 
of matrices. 

Also keeping in mind very large, sparse, inconsistent, 
singular or underdetermined general systems AX+XB=C for 
which direct methods are often of no avail, we propose 
compact implementations of projection and residual projection 



partitioned form (A-.).. ^ with A 

^ >|c 

A^^ nonsi'a£-plar and A-^ ^ A-^ p - A^^Ap^, 
e[A] = 0 [A ] + ©[B ] , 

Oj -L m 


and A22 feeing normal* 
then 

for all w e n 


where 


®22 “ ^22 


^ 21 ^ 11 ^ 12 ' 


This in fact generalizes an interesting result due to 
Haynsworth on the inertia of a partitioned Hermitian matrix. 


The next chapter is devoted to the study of linear 
transformations on matrices and vectors. We first prove 
that (a) any linear transformation T* on the set of n fey n 
complex matrioes* mapping Hermitian matrices into themselves 
and preserving the inertia of each Hermitian matrix is of 
the form T(A) = C*AC or T(A) = C*A’C where C is some 
nonsingular matrix and A' denotes the transpose of A and 
that (b) any linear transformation T mapping normal matrices 
into normal matrices and preserving the angularity of each 
normal matrix is also of one of the above forms, but with 
C=kU where and U is unitary. Surprisingly, it turns 
out that a linear transformation T mapping normal matrices 
into normal matrices- preserves inertia of each normal matrix 
if and only if it preserves the angularity of each normal 
matrix. 

The vectors which arise while discretizing a continuous 
function at nodal points can be viewed as defining a function 
on a discrete ordered set. The structural characteristics 



methods for such systems, respectively for minimum nom 
end least squares solutions, which are iterative in nature 
and do not lead to excessive memory-’ prohlem.s . The 
convergence of these procedures is always guaranteed as it 
can be easily inferred from the corresponding results for 
a full system Ax=bo 

The referenoes in the thesis are separately collected 
mainly under the three headings corresponding to the three 
maior topics mentioned before. Also, to make the thesis 
useful for other workers, whenever accessible, along with 
the most of the references their Mathematical Review and Zbl . 
abstract numbers have been quoted. Parts of Chapter 2 
and Chapter 3 have been accepted for publication in the 
Journal, Linear Algebra and its Applioations . 



1. THREE TOPICS IN MATRIX THEORY? A SURVEY 


1.1. Introduction 

As the title of the thesis indicates, in the chapters to 
follow we study some problems related to (a) angularity which 
is a generalization of the notion of inertia, (b) linear 
transformations having invariant subsets and functionals, and 
(c) iterative numerical procedures for the solution of the 
Lyapunov and some more general matrix equations. In the 
following three sections of this introductory chapter, we have 
tried to present a reasonably up-to-date survey of the past work 
having a bearing upon the above three topics. The topics of 
survey are (i) inertia theory (ii) linear transformations with 
invariants and (iii) the matrix equation AX+XB=C. 

The first two topics considered in the thesis derive their 
importance from their use in the mathematical analysis of various 
matrix problems while the last topic is important because of its 
many practical applications which to some extent have been 
elaborated in the discussion to follow. 

Indeed, as is well known, many physical, biological, social 
and economic systems ultimately have the mathematical descriptior 
or model defined by the vector-matrix equation 

= Ax (1.1.1) 

! 

where A is a given, constant nxn matrix and x is an 
n-dimensional column vector to be determined to satisfy certain 
initial conditions. In the above equation, x denotes the 


1 
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differentiation of x with respect to time t. The study of 
stability of such systems has been important in its own right 
since an unstable system is never acceptable. 

It is a well-established fact that the system governed by 
(1.1.1) is asymptotically stable in the sense that every solutior 
vector x(t) of ( 1.1.1) approaches zero as t -*■0° if and only if 
(iff) A is stable, i.e., all the eigenvalues of A have negative 
real parts (see Bellman [4, p.250]) , Conditions for A to be 
stable can be expressed in terms of the coefficients of the 
characteristic polynomial of A, the famous one being the 
Routh-Hurwitz criterion [7, Vol.II, p,194j. For an account of 
such stability criteria and related problems, one may refer, 
for instance, Anderson [24], Barnett [ 1,27, 28], Barnett and 
Siljak [ 30 ], Barnett and Storey [ 3 ], Duffin [55], Fuller [ 56 ], 
Gantmacher [ 7 , Vol.Il], Householder £ 62 ], Lancaster [l3], 

Lehnigk [l4], Harden [17]* Taussky [84] and Wall [87] and the 
references given therein. 

In practice, these classical approaches of testing the 
stability suffer from the difficulty of computing accurately 
the coefficients of the characteristic polynomial. This 
formidable job of computing these coefficients can very well be 
avoided if the following well-known result due to Lyapunov is 
utilized. It says that (see Bellman [4, p.254], Gantmacher 
[ 7 , Vol.II, p.189]) an nx n real matrix A is stable iff the 
matrix equation 

A% + XA = - I, 

A^ denoting the transpose of A and I denoting the nxn 


( 1 . 1 . 2 ) 
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identity matrix, has a symmetric positive definite solution X. 
The above result can be generalized to a complex matrix A also. 
Correspondingly, the matrix equation (1.1.2) is replaced by 

A*X + XA = - I (1.1.3) 

where A* denotes the conjugate transpose of A and in this case 
the solution matrix X should be Hermitian positive definite. 

In fact, Lyapunov's theorem holds even if I is replaced by any 
symmetric (Hermitian) positive definite matrix P in (l.l,2) 
((1.1.3)). 

Thus the stability problem, namely the problem of knowing 
whether the matrix A has all its eigenvalues in the open left 
half plane involves the solution of (l.l,2) or (1.1.3) and once 
this solution is obtained then its positive definiteness can be 
tested by the Sylvester determinants! conditions £31] » e.g., 
through Gaussian elimination. 

An equation of the form (1.1.2) or (1.1.3) or in 'general, 
that of the form 

A*X+ XA = C (1.1.4) 

or 

AX + XA*= C (1.1.5) 

where C is a given nxn complex matrix is called the Lyapunov 
matrix equation. Apart from determining the stability of A, 
there are so many applications of solutions of the Lyapunov 
matrix equation and its more general form 

AX + XB = C (1.1.6) 

called the Sylvester equation [232], where A, B, C are given 
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matrices of order mxm, nxn and mxn respectively and X is an 
mxn matrix to be solved for. Two such applications are 
explained below, while several more will be listed in Section 1.4. 

In the theory of control processes an important problem 
[4, 169] is to evaluate the quadratic cost functional of the form 

oo 

J = / x'^Bx dt (l.l,7) 

0 

with X satisfying x=A>c, x(0)=c, A being a real stable matrix 
of order n. In (l,l,7) , x^ denotes the transpose of x and B 
is a given nxn real matrix. If we suppose 

xr^Ebc = - -^(x^Cbc) (l.l.S) 

dt 

then we have 

A^Q + QA = - B (1.1.9) 

which is obtained by expanding the ri^t hand side of 
(l.l.S) and then substituting x = Ax. In view of (l.l.S) , 
the integral mentioned in (1.1.7) becomes 

J = - [s?Qx]^^^+ [x^Qx]^^q. (1.1.10) 

On the assumption that x tends to zero as t -*• oo and x(0)=c, 
it follows that 

J = c^Qc (l.l.ll) 

which can be easily evaluated, once we know Q, the solution 
of ( 1 . 1 . 9 ) . Thus we see in this process that the solution of 
the Lyapunov matrix equation enables to avoid the problem of | 

solving the system of differential equations as well as the , 

cumbersome job of evaluating the required improper integral. 



5 


The other example illustrates how -we can use the solution 
of the Sylvester equation to solve certain large linear systems 
arising in boundary value problems. Suppose we wish to solve 
the two dimensional Poisson’s equation 

( 1 . 1 . 12 ) 




u _ 


f(x,y) 


over a rectangular region covered by a square grid of side h^ 
say with rs interior points arranged in r rows and s columns. 
Denoting a function value at the grid point (ph,qh) ^7 
we get the system 

H.q ■ S-l,q Vl.q “p.9-1 

P“1 9 • . . $ s ^ q**! (1.1.13) 

as a finite difference approximation in which u’s lying on 
the boundary are given their prescribed values [ 20 , p, 2923 . 

If the equations for the points on the first interior row are 
written in order, i.e., with q=l, p=l,...,s, followed by the 
equations for the points on the second, third rows etc,, we 
obtain the matrix-vector form of the above linear system of 
orxier rs as 

Mx = c ( 1 . 1 . 14 ) 

where M = (M. .) • --t r. partitioned form having each 

M. . as a square matrix of order s, defined by 

J, J 


M 


10 


A if i= 3 , 

{ -Ig if li-ol=l, 

0 otherwise 


where A = (a^-^) is a tridiagonal matrix with 

J- J 
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4 for 1=0 » 

= { -1 for li-ol=ls 
0 otherwise 


and Ig is the identity matrix of order s (see Bickley and 
McNamee [l73]). It may be observed that x appearing in 
( 1,1.14) has the transpose 




(u^l, ^^ 2 * •••» -s2' “lr» ”sr' 

Of course, the elements of c depend on f(x,y) and on the 
prescribed values of u at the boundary points. It is not 
difficult to see that (l,l.l4) is equivalent to 


((I„t& A) + (B"^ 0 Is))^ = ^ 


(1.1.15) 


,T 


j I 

where ® denotes the Kronecker product [4, p,235] and B 
is the transpose of B and B = (b,-^) is a tridiagonal matrix 
of order r defined by 


-1 if Ii-dl=l, 

^id ~ ^ 0 otherwise. 

Now (l,1.15) can be converted [25l] to the matrix equation 
AX+XB=C where X is the sxr matrix (u^^) and C is also a 
matrix of order sxr having its first column as the first s 
coordinates of c, the second column as the next set of s 
coordinates of c and so on. Thus the system (1.1.14) involving 
a large matrix M has been equivalently written as a matrix 
equation involving matrices of lower dimensions. This suggests 
that it would be well worth developing algorithms for solving 
the Sylvester matrix: equation rather than its bigger matrix-vector 
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version of the form (l.l,l4) so that storage and computer time 
can be saved. 

¥e conclude this section with some notational conventions 
adhered to all through the thesis. 

The matrices are denoted by capital letters and, unless the 
contrary is stated, a matrix is assumed to be square of order n. 
The (i,a) element of A will be denoted by a^- ^ and we write 
A = (a-ij) • The identity matrix of order n is denoted by I and 
if it is of ^any other order say m, then it is denoted by Im- 
The diagonal matrix having dj_ as its i-th diagonal entry 
(i=l,...,n) will be denoted by diagCd^, . • • • The transpose 

of a matrix A will be denoted by A' or A^ and the conjugate 
transpose of A by A*. We write Re(A) to denote (A+A*)/2. 

The trace of A is denoted by tr(A) and the spectral radius of 
A by P(A) . 

We denote the class of all mxn real matrices by 
and the class of all nxn real matrices by M^OR) . The 
corresponding classes of complex matrices will be denoted 
simply by M and M , always denotes the n-dimensional 

lii«L ^ XX xJL 

real (complex) space of column vectors. We adopt the notation 
and to represent the sets of all nxn Hermitian and normal 

-f- 

matrices respectively. The symbol 2^ (2j^) consistently denotes 
the class of all positive (negative) definite Hermitian matrices 
in M^. With the exception of Sections 3 and 4 of Chapter 3, 
i.e., Sections 3.3 and 3.4 (in which A > (2) 0 denotes a matrix 
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having all positive (nonnegative) entries) , elsewhere P > {>) 0 
means that P is a positive definite (semidefinite) Hermitian 
matrix. To mark the end of a proof of a lemma, theorem or 
corollary we use AAA., 

1.2. Inertia Theory 

The notion of inertia of a matrix was introduced by 
Ostrowski and Schneider [7l] in order to generalize the 
celebrated result of Lyapionov mentioned in the preceding section 
and also to study various stability concepts like D-stability, 
H-stability arising in mathematical economics [l, Chapter 4]. 

The inertia In(A) of A s is defined as an ordered 
triple (tc(A), ^(a) , 5(a)), the entries denoting the total number 
of eigenvalues of A with positive, negative, and zero real parts 
respectively [7l]. We have, of course, 

^(A) + v(A) + 5(a) = n. (1,2.1) 

In view of the above definition, it is clear that A is stable 
iff In(A) = (0,n,0). Stable matrices are also termed as 
negative stable matrices and by a positive stable matrix we 
mean a matrix whose inertia is (n,0,0) . 

Theorems involving inertia of a matrix are called the 
inertia theorems. The purpose of this section is to give a 
brief account of various inertia theorems. Just recently, 

Cain [ 36 ] has published an excellent survey on inertia theory 
which also includes sane new results. We do not intend to 
repeat the material presented in that paper. However, we quote 
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certain Important results with an attempt to highlight material 
which does not feature prominantly in Cain's paper. 

The first known inertia theorem is the classical Sylvester's 
law of inertia. In the following theorem , each statement may be 
considered as Sylvester's inertia theorem. 

THEOREM 1,2.1 (see Cain [36], Carlson [39] s Carlson and 
Schneider [4?], Ostrowski and Schneider [7l] and Wielandt [SS]) . 
The following three statements are true and are equivalent to 
one another. 

(i) If H s Hn and S is nonsingular, then In(S*HS) = In(H) . 

(ii) If P > 0 and H e then In(PH) = In(H) . 

(iii) If AH > 0 and H & H^^, then In(A) = In(H) . 

Originally Sylvester' s theorem appeared in the language of 
quadratic forms (see Mirsky [l8, p.377]) . Rado [74] provides 
a generalization of Sylvester's theorem involving three 
quadratic forms whereas Schneider [78] gives a topological 
interpretation of Sylvester's theorem. 

Ostrowski [ 69 ] obtained a quantitative formulation of 
Theorem 1.2.l(i) with an extension to singular and rectangular 
cases in another paper [70]. Thompson [85,86] further 
generalized these results of Ostrowski, Details of these 
generalizations will be presented in the forthcoming chapter 
where we prove some more generalizations of Sylvester's theorem, 
including a generalization of Thcmipson's results to nomal 
matrices. 
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The second known inertia theorem is the classical result 
of Lyapunov, the one that we have seen in the previous section 
giving a nice characterization of stable matrices. In this 
context, we shall provide a list of characterizations of stable 
matrices in the form of a theorem, 

THEOREM 1.2,2. Let A s Then the following fifteen 

c ond it i ons are e qui val ent . 

(i) A is stable. 

(ii) All the solutions of the system x = Ax approach zero 
as t [4,6,7,36]. 

(Ill) = 0. 

t-»oo 

(iv) For a > 0, A-«I is nonsingular and the spectral radius 
of (A-aI)"^(A+aI) is less than unity [6]. 

(v) For a > 0, A-al is nonsingular and (A-al) ^(A+al) is 
convergent, i.e., { (A-al) ^(A+al)} -» 0 as k “ [20]. 

(vi) For a > 0, A-al is nonsingular and the sequence {xj^} 
k=0,l,2,... defined by the iterative scheme 
(A-al)xj^^^ = -(A+aI)x2^ + 2b 

converges to the solution of the linear system Ax = b 

for any initial guess x^ [20]. 

(vii) For C s there exists H s such that AH+HA = C 
(Lyapunov’s theorem) [4,7,13,71,83]. 

(viii) There exists H e such that AH+HA s E^ (partly 

weaker and partly stronger form of (vii)) [lO, 16,36, 
71,83]. 
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(ix) There exists H s such that Re(x*Ajfe) < 0, for all 
nonzero x s (C ^ [65]. 

(x) There exists H s "that the numerical range of 

AH is contained in the open left half plane [64]. 

(xi) A is similar to a matrix which can be expressed as the 
sum of a skew-Hermitian matrix and a negative definite 
Hermitian matrix [50]. 

(xii) There exist H s E^, Q s E^ and a skew-Hermitian 
matrix S such that A = (S+Q)H [3l]. 

(xiii) For C e E^, the matrix equation A H+HA=C has a solution 
and 0 ^ AY+YA* positive semidefinite implies 
tr(Y) < 0 [32]. 

(xiv) For a > 0, A-al is nonsingular and for every C s E^^ 
there exists H e E^ such that BHB*-H = C where 
B = (A-aI)’’^(A+aI) (Stein's theorem) [80,83]. 

4 * 

(xv) For a > 0, A-al is nonsingular and there exists H e E^^ 
such that BHB — H e E^^ where B is as in (xiv) (partly 
weaker and partly stronger form of (xiv)) [10,36,80,83]. 

In literature, there are many generalizations of Lyapunov's 
theorem. For example, Wong [94] proved that Lyapunov’s theorem 
is valid in the set up of operators. Joiinson [64] proved a 
Lyapunov theorem for angular cones which is a generalization 
of the formulation ( x) of Theorem 1.2.2. Taussky [8l] 
observed some connection between Lyapunov's theorem and 
D-stability. Also see Reid [76]. 
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Another landmark in the development of inertia theory is 
the following theorem, what we call the main inertia theorem, 
proved independently by Ostrowski and Schneider [7l] and 
Taussky [82] . 

THEOREM 1.2.3. Let A s M^. Then there exists H s 
such that Re (ah) > 0 iff 6(A)=0. Moreover, if H e and 
Re(AH) > 0, then In(A) = In(H) . 

An alternative proof of this important result is presented 
by Lancaster [23l] using projections. 

The interest in the main inertia theorem is that from this 
the two classical results on inertia due to Sylvester and 
Lyapunov can be easily deduced [36]. As another corollary 
of the main inertia theorem, Ostro^^)'ski and Schneider [7l] proved 
that if Re (a) >0 and H e then In(AH) = In(H) . This result, 
known as Wielandt’s theorem [88], is in fact equivalent to the 
second part of the main inertia theorem. 

Wimmer [89] has given a shorter proof of the second part 
of the main inertia theorem, by demonstrating that it is 
equivalent to the following result. 

THEOREM 1.2.4. Let H be a Hermitian matrix partitioned 

in the form ^m ^n-m* 

In(H) = (m,n-m,0) . 

The proof of this theorem essentially depends on Sylvester's 
theorem. Moreover in proving that Theorem 1.2.4 implies the 


I 
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second part of the main inertia theorem, Lyapunov’s theorem has 
been used. In view of this it seems to be more appropriate 
to say that the second part of the main inertia theorem is a 
consequence of Sylvester's and Lyapunov’s theorems. Rather 
it may be claimed that the second part of the main inertia 
theorem is equivalent to the two classical results, assuming 
the fundamental result on the existence and uniqueness of the 
solution of the Sylvester equation which will be stated in 
Section 1.4, 


It may be noted that Theorem 1.2.4 has appeared as a 
corollary to the following result proved by Haynsworth [5?]. 

THEOREM 1.2.5. Let H be a Hermitian matrix partitioned 

in the form (H. .) . . ^ ^ where H__ is nonsingular. Then 

ij 1,3=1, 2 11 

In(H) = InCH^^^) + In(K 22 ) (1.2.2) 


where 


■^2 = ^22 - «^ 2 « 5:]:«12 


(1.2.3) 


and the sum of the inertias is performed by componentwise 
addition. 


Some more results based on this interesting formula may be 
found in [45,57,58]. In the next chapter, we extend the 
above theorem to a partitioned normal matrix. 


We now turn to the generalizations and applications of the 
main inertia theorem discussed by various authors. In this 
connection, we need the concept of controllability of a pair of 
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matrices. If A s and B e then the pair (AjB) is 

said to be controllable [ 90 ] iff the rank of the nxnm matrix 
(B AB A^B . . . A^^^B) is n. In what follows we use In(H) < In(A) 
in the sense [4?] that itCH) ;< 7t(A) and '^(H) _< (A) . This 

definition is maintained even if H and A are square matrices of 
different orders. 

To start with, Carlson [37,38] and Carlson and Schneider 
[46,47] studied the main inertia theorem under the situation 
Re (ah) > 0 and arrived at the following theorem. 

THEOREM 1.2.6. Let A s and H e H^^. If 5 (a) =0 and 
Re(AH) 2 then In(H) < In(A) . If in addition 6(H) =0, then 
In(H) = In(A) . 

Snyders and Zakai [27l], Chen [48] and Wimmer [ 90 ] have 
shown that in Lyapunov* s theorem and in the main inertia 
theorem, we may replace "Re (AH) > 0" by "¥ = Re(AH) > 0 and 
(A,W) is controllable” . Hence we have 

THEOREM 1,2,7. If A s and H e H^^ such that 
Re(AH) =¥20 and (A,¥) is controllable, then 6(A) = 6(H) = 0 
and In(A) = In(H) . 

An extension of this result is given in ¥immer [275]. 

Using the concept of projection matrices and nilpotent matrices, 
¥immer and Ziebur [93] gave a unified treatment of results 
stated in Theorems 1,2,6 and 1.2,7, 

Motivated by Schneider* s theorem [ 77 ] which is a 
generalization of Lyapunov's and Stein’s theorems. Hill [59, 60 ] 
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has developed the inertia theory by considering the equation 


2 g. .A.H aJ = 
i » j =1 ^0 1 3 


(1.2.4) 


where P s H e and G = (Sij) ® A2,...»Ag are 

quasi-commutative, i.e., each of A2,...,Ag commutes with 
AiAj - ■^j'^i (i» j=l» * • • »s) and obtained generalizations of many 
inertia theorems. For G = [^ qJ, A^ = I, A 2 = A, Hill^ s 
theory gives Lyapunov's theorem and the main inertia theorem. 

For G = diag(l,-l), A^ = 1? ^2 ~ gives Stein's theorem 

and the Stein analogue of the main inertia theorem. By setting 
G = diag(l,-l,-ls, . . . ,-l) , A^ = I, A^^^ = C^, i=l,...,s-l, Hill's 
theory becomes Schneider's theory [77]. 


By means of a generalized notion of controllability, 

Carlson and Hill [43] extended Theorem 1.2.7 to Hill's setting 
given by (1.2.4). Wimmer [92] has considered the specialized 
form of (1.2.4) with s=n and A-^ = A^"*^, k=l,,..,s with P 2 
to obtain some generalizations of the Lyapunov and Stein 
theorems, Howland [63] attempted to generalize the main inertia 
theorem to the equation 


n-l 

2 

i,0=0 




gijA-HA 



P 


(1.2,5) 


where H s H^^ and P > 0. Later, Chen [49] disproved Howland's 
result and obtained some inertia theorems for a less general 
form than (l.2,5) . Pointing out that Chen's results were also 
false, Carlson and Hill [44] corrected them to prove some 
generalizations of the main inertia theorem. 
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The simplified version of Stein *'s theorem says that for 
B s P(B) < 1 iff there exists H s such that BHB*-H c E". 
By applying the Cayley transform B = (A-I) ""^(A+I) , it was 
shown by Taussky [83] that Stein’s theorem is equivalent to 
Lyapunov's theorem. By the same technique, the following Stein 
analogue of the main inertia theorem follows. In order to 
state the theorem, we define, for B e Mj^, 

InA(B) = (7tA(B), va(B), 6^(B)) 

where the entries in the triplet denote the total number of 
eigenvalues of B lying respectively outside, inside, and on the 
unit circl e . 

THEOREM 1.2.8 (see Cain [36], Hill [60] and Wimmer [89]). 
Let B e M^. Then there exists H s H^ such that BHB*-H > 0 

iff 6^(B)=0. Moreover, if H s H^ and BHB*-H > 0, then 

IuaCb) = In(H) . 

The Stein analogues of Theorems 1.2.6 and 1,2.7 are given 
in Datta [52] and Wimmer and Ziebur [93] respectively. 

A great deal of effort has been devoted by Cain [33-36], 
to generalize almost all the foregoing results to infinite 
dimensional setting. These generalizations are extensively 
discussed in a very recent and expository survey paper on 
inertia theory by Cain [36]. 

Hill [6l] makes some interesting applications of the 
main inertia theorem to obtain necessary and sufficient 
conditions for a matrix to have no eigenvalue on any arbitrary 
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circle, line and ceortain other curves in the complex plane. 

As a corollary to the main inertia theorem, Joyce and Barnett 
[67] proved that In(A) = In(H) if A = H ^(S+C) where H e 
C > 0 and S is skew-Hermitian. In the same paper, a sufficient 
condition on B such that In(A+B) = In(A) has been obtained. 

It has been shown by Cain [36] that In(AH) = In(H) for 
every H e iff Re(A) >0. In this result, one part is 
Wielandt's theorem [ss] that we have seen earlier and the 
other part follows immediately from Remark 2 of Carlson’s 
paper [40] on H-stability. 

Other studies on inertia of matrices include the following. 
Inertia theorems for tridiagonal matrices are discussed in 
Carlson and Datta [4l], Datta [5l], Schwarz [79] and Wimmer [ 9 I] 
and for Hessenberg matrices in Datta [53]. 

Johnson [66] has determined the precise set of possible 
inertias of product of two nonsingular Hermit ian matrices with 
known inertia. Loewy [68] characterized all inertia triples 
(a,b,c) that are attained by AH+HA* as H varies over the set of 
all nxn positive semidefinite Hermitian matrices. The same 
study was carried out for positive definite case by Avraham and 
Loewy [25]. The class of 3x3 real matrices M such that 
In(MD) = In(M) for all positive diagonal D is characterized by 
Bahl and Cain [26]. Palmer [72] gives sufficient conditions 
for an n X n matrix to have 5(A) =0. 
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Next, we shall say a few words about the computation 
of inertia of a matrix. Of course, one way of computing the 
inertia of A is to find the eigenvalues of A, directly, using 
an efficient algorithm [22] and then to couint the number of 
eigenvalues having positive, negative, and zero real parts. 
Without computing the eigenvalues also there are effective 
algorithms to determine the inertia. It has been shown by 
Barnett [29] that matrix sign function [54] can be used to 
compute the inertia of a matrix. If A = SJS“^ where J is the 
quasi-diagonal matrix diag(J^,J_, J^) with J_, Jq representing 
the direct sums of the Jordan blocks corresponding to the 
elementary divisors associated with eigenvalues having positive, 
negative, and zero real parts respectively [7, Vol.l], then 
the matrix sign function sgn(A) of A is defined as SDS*^ where 
D is the diagonal matrix having the first 7t(A) diagonal entries 
as 1, the next '^(A) diagonal entries as -1, and the last 5 (a) 
diagonal entries as 0. If A is dichotomic, i.e., if 5 (a) = 0, 
then sgn(A) is computed recursively [54,73] by 

\+l ^ ^-^k ■^k^V2, k=0,l,2 ,... with Aq=A. (1.2.6) 

Hence it follows that 

7t(A) = (n+tr(E))/2 and "^(a) = (n-tr(E))/2 (l.2,7) 

where 

E = sgn(A) = Ak. (1.2.8) 

In [42] , Carlson and Datta have described an effective 
computational procedure to compute the iner*tia of non-Hermitian 
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matrices. The method is based on the reduction of the given 
matrix A to a lower Hessenberg matrix with tmit codiagonal and 
then constrnacting a Hermit ian matrix H whose inertia is the 
same as that of the transformed matrix. An earlier related 
paper is by Meyer-Spasche [246], 

Apart from the results outlined above, there are many 
other interesting results in inertia theory of matrices. The 
references given under the title "inertia theory" at the end 
of the thesis may be consulted for further details. 

Finally let us give a brief account of our work related 
to inertia theory carried out in the present thesis. ¥e 
introduce a notion of angularity of a matrix (due independently 
to Cain [36] and Rathore and Chetty [75]) as a generalization 
of the concept of inertia and prove some angularity theorems 
concerned with normal matrices. These results include some 
well-known inertia theorems as special cases. Also we 
determine the structure of inertia- and angularity-preserving 
linear transformations on Hermitian, normal and circiolant 
matrices , 

1,3. Linear Transformations with Invariants 

In recent years, it has been of considerable interest to 
study the set L(f,S^ of all linear transformations T; -* Mjj 
such that T(Sjj) £ and f(T(A)) = f(A) for all A c where 
is some given subset of for instance, the class of all 
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Hermitian matrices and f is some given function on 
e.g., det(A) . In case T: -* has to satisfy only 

T(Sj^) c then we shall denote the set of all linear 
transformations T by L(.,Sjj) , 

Within the last two decades a considerable amount of work 
has been done to determine the stmcture of these sets L. In 
most of the cases we see that T has the form 

T(A) = UAV for all A s (l.3.l) 

or 

T(A) = UA’V for all A s (l,3.2) 

for suitable U, V s where A' denotes the transpose of A. 

In this section we present a concise review of the work 
done till 1980 in the field of linear transformations on 
matrices. First we mention that there are two excellent 
expository surveys [117,118] in this area due to Marcus who has 
made a notable contribution to this sufficiently well-established 
aspect of linear algebra. These two survey papers may be 
consulted for fiorther details and references of earlier work 
in the field. 

According to Marcus [lis], probably Frobenius initiated 
the study of determining the structure of L(f,Sjj) in 1897, by 
proving the following result. 

THEOREM 1.3.1. Let T; M^^ M^^ be linear such that 

det(T(A)) = det(A) . Then T has the form (l.3.l) or (1.3.2) 

where det(UV)=l. 
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After a lapse of 28 years, the above result of Frobenius 
was extended and improved by Schur who considered the linear 
transformations Tj involving determinants of 

submatrices. Later, Marcus and May [l2l] reformulated Schur^ s 
result in terms of r-th compound matrix C^(A) and gave an 
alternative proof. ¥e recall that if A s then C^Ca) is 

the (p) X (p) matrix whose elements are the r-th order 
sub determ inants of A arranged in doubly lexicographic order 
[l6, p,l6]. The given alternative proof depends upon the 
following important result due to Marcus and Moyls [124] . 

THEOREM 1.3,2, If T ; ^ - M^ ^ is linear and maps rank 

one matrices to rank one matrices then there exist nonsingular 
matrices U and V in and Mj^ respectively such that for m;^n 

T(A) = UAV for all A s (1.3.3) 

and for m=n, T has the form (1,3.3) or 

T(A) = UA’V for all A s (1.3.4) 

The theorem just mentioned was proved using multilinear 
algebra techniques. Later it has been reproved by Mine [l29] 
using only elementary matrix theory. Also, in the same 
paper Mine shows that if T preserves the determinant it also 
preserves rank 1 and thus deduces the classical result of 
Frobenius, stated in Theorem 1.3.1. 

Details of further extensions and applications of rank 1 
preservers can be seen in [117,118]. 
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Linear transformations mapping the set of rank k matrices 
into itself has been studied by Djokovic [lOs]. In [95 >96], 
Beasley has given a variety of sufficient conditions under 
whioh the linear map T; ^,n P^^eserving rank k matrices 

has precisely the same form given in Theorem 1.3.2. To sample, 
some of the sufficient conditions are 

(i) T is nonsingular 

(ii) n > k^ + k 

(iii) min(m,n) = k 

(iv) k = 3. 

A closely related paper about rank k preservers on the space 
of symmetrio matrices is written by Lim [ll3]. 

Another problem to which many of the results concerning 
L(f,Sj^) can be reduced is to determine the structure of 
L(rank,M^) . It has been proved by Marcus and Moyls [l25] 
through a sequenoe of lemmas that T e L(rank,Mjj) has the form 
given in Theorem 1.3.2. 

A related problem is to find L(.,Sjj) with Sj^ as the class 
of all nonsingular matrices in It has been proved by 

Marcus and Purves [l26] that in this case also the conclusion 
is as in Theorem 1.3.2. Much earlier, Dieudonne [lO?] 
concluded the same by taking as the class of all singular 
matrices in M^ and assuming that T is nonsingular. Recently 
in 1978, Botta [ 103 ] donnected the above two resxilts by 
deriving the result pertaining to nonsingular matrix preservers 
frcm the result corresponding to singular matrix preservers. 
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The structure of L(rarLk,Sn) -with as the class of all 
nxn skew-symmetric matrices has been discussed in Lim [115]. 

One of the interesting choices of invariant functions in 
the study of linear preserver problems is the r-th elementary 
symmetric function of the eigenvalues of A e It 

may be noted that E 2 _(A) = tr(A) and E^Ca) = det(A) , In this 
connection, Marcus and Purves £l26] established the following 
in 1959. 

THEOREM 1,3.3. ^ i ^ £ n-1, then T <s L(E^,M^) has the 

form (1.3.1) or (1.3.2) where 

UV = e%, s 0 (mod 2n) . (1.3.5) 

This resiiLt says that if 4 ;< r < n-1 and Ep(T(A)) = Ej,(A) 
for all A s then the linear map T: is essentially 

(modiolo taking the transpose and multiplying by a constant) 
a similarity transformation. This result is not valid for 
r=l,2 and for this coimter examples have been provided in the 
same paper. Also for r=n, the above result does not hold. 

Of course, the latter case corresponds to Theorem 1.3.1. 

Moreover, this case has been discussed separately by Marcus and 
Moyls [ 125 ] also. The only unsettled case was 3 p= 3. Tn 1970, 
this was also settled nicely by Beasley C95»97] who proved by 
an ingenious argument that Theoran 1.3*3 holds for r=3 also. 

Kovacs [ 112 ] characterized trace-preserving linear maps 
and also Ej (A) -cum-E 2 (A) -preserving linear maps. The linear 
transfoimations on the space of nxn skew-S 3 niunetric real matrices 



24 


preserving have been treated by Marcus and Westwick [12?]. 

The problem of determining the structure of L(h,Mj^) with 
the choice of h(A) as the completely symmetric polynomial in 
eigenvalues of A has been considered in Marcus and Holmes [l20j 
whereas for a large class of polynomi^s h, the structure of L 
has been studied by Rackusin and Watkins [l40]. 

By taking f(A) as the trace of the positive semidefinite 
square root of A*A (i.e., the sum of the singular values of A), 
Russo [l42] proved that a linear transformation Ts — M^ 
satisfying T(I)=I and f(T(A)) = f(A) for all A «5 has the form 
(1,3.1) or (1,3.2) with V = U* and U is -unitary. 

Defining 0p(A) as the r-th elementary symmetric function 
of the eigenvalues of A*A, i.e., Ej,(A*A) , Marcus and Mine [l23] 
obtained for 1 < r £ n the structure of all linear transformationi 
^,n \i,n satisfying 0 j,(T(A)) = 0^(A) as the one given in 
Theorem 1.3 *2, with U and V \jinitary instead of nonsingular. 

If Ts Mj^ -*■ M^ is linear it will be interesting to note that 
T preserves eigenvalues for all matrices in M^^ iff it preserves 
eigenvalues for all Hermitian matrices in M^^ [l25]. Moreover 
in this case T is of the form (l.3.l) or (1.3.2) with UV=I. 
Consequently, the following theorem was proved by Marcus and 
Moyls [125]. In the sequel, ev(A) denotes the set of n 
eigenvalues of A including multiplicities. 

THEOREM 1.3.4. Let T: M^ be linear. If T(Hn) c H^^ 

and ev(T(H)) = ev(H) for all H s H^^ then there exists a 
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imitary U s such that 

T(A) = UAU* for all A e (1.3.6) 

or 

T(A) = Ua'U* for all A s (1.3.7) 

¥e will make use of this result in order to prove one of 
our main results in Chapter 3. 

By characterizing the linear transformation Ts 
which leaves both trace and determinant of each matrix A 
invariant as essentially the similarity transformation of 
either A or A' , Mine [l28] showed that T c L(f ,Aj^) , where 
f (a) is ev(A) and is the class of all nonnegative matrices 
in (i.e., matrices all of whose entries are nonnegative), is 
of the form 

T(A) = P”^AP for all A s Mn (1.3.8) 

or 

T(A) = P'Vp for all A e Mn (1.3.9) 

where P is a nonnegative generalized permutation matrix. 

A s Mjj is said to be a generalized permutation matrix if it 
has precisely one nonzero entry in each row and in each 
column [l28] . In a generalized permutation matrix if all 
the n nonzero entries are 1, then it is a permutation matrix 
and if all the n nonzero entries are positive, it is called 
a nonnegative generalized permutation matrix. 

Next, we are concerned with permanent-preservers. 

If S is the symmetric group of degree n, then the permanent 
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of A = (sfj) ^n» denoted by per(A), is defined [l22] by 

per(A) = 2 7T a- (1.3.10) 

cyeSi=l icfU; • 

To begin with, Marcus and May [l22] proved that for n> 3, 
the linear transformation Tj Mj^ -♦ Mj^ such that per(T(A) )=per(A) 
for all A s Mj^ has the form 

T(A) = DPAQL for all A e Mn (1.3.11) 

or 

T(A) » DPA'QL for all A c M^ (1.3.12) 

where P, Q are permutation matrices, D, L are diagonal matrices 
such that per(DL)=l. This result was arrived at after proving 
nine lemmas. Botta £lOO] has proved the same resiflt in a 
somewhat more direct way. Moyls, Marcus and Mine [130] 
studied the permanent-preservers on the space of doubly 
stochastic matrices whereas Lim and Ong [ll4] studied the same 
on the space of real S 3 mimetric matrices. In [l36]. Pierce 
considered discriminant-preserving linear maps. 

The structure of L(f ,Mj^) with f as the generalized matrix 

function in the sense of Schur, i.e., 

n 

f(A) = X(a) TT A= (a.J cM (1.3.13) 

creG lovi; -'-J ^ 

where X is a nonzero function defined on a subgroup G of the 
symmetric group acting on {l,,..,n}, has been investigated in 
Botta [ 101 , 102 ] and Ong [l32]. Ong and Botta [l33] and Ong 
[ 131 ] studied linear maps preserving the class of generalized 
permutation matrices. 
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In 1976 , Watkins [l44] proved that for n 2 
nonsingular linear map Tj satisfying "AB=BA implies 

T(A)T(B)=T(B)T(A) for all A and B in (i.e., T preserving 
commuting pairs of matrices) has the form 

T(A) = cU-^AU + g(A)I for all A e (l.3.l4) 

or 

T(A) = cU^Vu + g(A)I for all A s (1.3.15) 

for some scalar c, nonsingular matrix U and linear functional g. 
The invalidity of this result for n=2 "was showi hy means of 
counterexample. Once again it was Beasley [98] who settled the 
problem by showing that the result of Watkins holds for n=3. 

The k-th numerical range of A s denoted by Wj^(A) 
is defined by 

k 

Wj^(A) = {^Z^(Av^,v^)} (1.3.16) 

where {v^,..,>Vj^} runs through all orthonormal sets. The k-th 
decomposable numerical range of A is defined to be the set 

W^(A) = {det(X*AX)s X s and det(X*X)=l}. (1.3.17) 

It may be observed that in both cases k=l gives the classical 
numerical range. Pierce and Watkins [l37] considered the 
problem of determiining all linear transformations T: "*■ 

which preserve Wj^(A) . Just recently, Marcus and Filippenko 

A 

[119] treated the corresponding problem for Wj^(A) and proved 


that T is of the form 
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T(A) = CU*AU for all A <s (1.3.18) 

or 

T(A) = CU*a'U for all A e (1.3.19) 

where ? is a complex k-th root of unity and U is unitary. 

This result is a generalization of a result of Pellegrini £l34] 
who characterized the linear operators preserving the classical 
numerical range. 

Next we come to unitary-preserving maps. It is proved by 
Marcus [ll6] that T e L(. 5 Uj 3 ^), where is the class of all 
nxn unitary matrices, is of the form (l.3.l) or (1.3.2) where 
U and V s U^. Botta [l04] gives a new proof of this result 
under the special assumptions that T is nonsingular and n ^ 3. 
It was conjectured by Marcus [ll8. Conjecture 6] that if 
T: M^OR) — M^CR) is linear such that T maps the real orthogonal 
group G into itself then T must have the form (l,3.l) or (1.3.2) 
in which U, V belong to G. Wei [l45] showed that this 
conjecture holds except for n = 2, 4, or, 8 and that in the 
exceptional cases there exist singular maps. Linear operators 
preseorving certain algebraic groups are discussed in Pierce 
[ 135 ] and Botta and Pierce [l05]. 

A linear transformation T; M^ - ]y^ is said to be 
Hermitian-pre serving iff T(H|^) c de Pillis [l38] shows 

that Hermit ian-preserving linear transformations T have the 
structirre given by 

Ts A ** Z a^X^A’x^ 
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where each is real and is a certain matrix in 
depending on T. Hill [llO] gives three sets of necessary and 
sufficient conditions for T to be Hemitian-preserving. Some 
more papers related to Hermitian-preserving maps are Choi [l 06 ] 
and Poluikis and Hill [l39]. 

It is shown by Schneider [773 that a linear transformation 
T on the real space H^ of nxn Hermitian matrices (i.e., a space 
in which a matrix may be multiplied only by a real scalar) 
taking the cone of positive semidefinite matrices onto itself 
preserves rank. It is fiirther shown that in this case there 
exists a nonsingular C such that 

T(H) = CHC* for all H s (1.3.20) 

or 

T(H) = CH'c'' for all H c H^. (1.3.21) 

If T is a linear transformation on the space of nxn real 
symmetric matrices S^ into itself sending real positive definite 
matrices into themselves and satisfying det(T(A)) = c(det(A)) 
where c is a nonzero real constant, then Eaton [IO 9 ] proves that 
there exists a real nonsingular matrix M such that 

T(A) = mam’ for all A s (1.3.22) 

Sinkhom [l43] and Benson [99] characterized T s L(., 
where represents the class of all generalized doubly 
stochastic matrices in Mj^ (i.e., nxn complex matrices whose 
row and colxmin sums are l) as linear combinations of functions 
of the type 


T(X)= AXB 


(1.3.23) 
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with restrictions posed on the nxn matrices A and B. 

It has been shown by Beasley [95] that linear maps 
Tj preserving minimal polynomial has the form (l.3.l) 

or (1.3.2) with UV = I, The structure of linear maps that 
preserve matrices annihilated by a polynomial is the study of 
Howard [ill]. 

Recently, Robinson [l4l] has shown that results 
characterizing the structure of L(rank,M^) , L(det,M^) and 
L(ev,M^) can have direct generalizations to multilinear functions 
on Mjj, Specifically, if f(X) is the rank, determinant or the 
set of eigenvalues including miiltiplicities of X and m is a 
positive integer, the set of m-linear functions T from the 
Cartesian product x ... x (m copies) to satisfying 

f(T(X^,...,X^)) = for all Xj_,...,X^ s (1.3.24) 

has been determined. 

To get an overall picture about the structure of 
or L(.,Sj^) for various f and Sj^, we shall tabulate below the 
results presented in the survey for which Tj ]y^ -► is of the 
form (1,3.1) or (1.3.2). In the table, we use the following 
notation. R]_ is the set of all rank 1 matrices, is the set 
of all nonsingular matrices, SL^ is the set of all singular 
matrices, is the set of all unitary matrices, is the set 
of all nonnegative matrices, is the set of all permutation 
matrices, is the set of all nonnegative generalized 

permutation matrices and is the set of all diagonal matrices. 
All these sets are subsets in 



TABLE 1.3.1 

Str^ictures of linear transformations on matrices 
"With some invariants 
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S.No. f(A) 

Sn 

Additional 

assumption 

Condition on 

U and V in 
(1.3.1) and (1.3.2) 

1. det(A) 

Mn 

- 

det(UV) = 1 

2. 

% 

- 

U,V s GLjj 

3. rank(A) 

^n 

- 

U,V e GL^ 

4. 


- 

U,V « QL^ 

5o 

SLv. 

T is 

U,V c (X„ 



nonsingular 

^ n 

6. E^^CA) 


3 <. r £ n-1 

UV = 

r0 = 0 (mod 2n) 

7. tr{(A*A)^^^} 


T(I) = I 

UV = U s 

8. E^(A*A) 


1 < r < n 

U,V e GL^ 

9. ev(A) 


- 

UV = I 

10. ev(A) 

Hn 

- 

UV = I, U e U^ 

11, trace-cxim- 
determ inant 

M 

n 

- 

UV = I 

12. ev(A) 


- 

UV = I, U c NGPj^ 

13 . per(A) 


n > 3 

U = DP, V = QL 
where P,Q s Pj^ 

D,L e D^ such that 
per(DL) = 1 

1 

« 

<|- 

H 

Un 

- 

U,V e 

15 . minimal 

polynomial 
of A 

^n 

- 

m = 1 
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We may also mention here that in Chapter 3 of the present 
thesis we determine the classes L(In,H^) , L(0j,Njj)j L(lnyNj^), 
L(6,Cj^) and L(ln,C^) where In and 0 stand for inertia and 
angularity respectively and denote respectively the 

classes of all Hermitian, normal and circulant matrices in 
Also we determine the structure of linear transformations on^R^ 
and preserving certain qualitative and quantitative invariants, 

1.4, The Matrix Equation AX+XB=C 

Our concern in this section is with the linear matrix 
equation 

AX + XB = C (1.4.1) 

where A, B, C are known complex matrices of order mxm, 
nxn and mxn respectively. It may he observed that A and B 
shoiILd necessarily be square matrices for the above equation 
to be conformable. In literature » the equation (l,4,l) is 

referred as the Sylvester equation and for the choice B=A*, 
it is well known as the Lyapunov matrix equation. 

During the past quarter of a century, these two equations? 
with more emphasis on real type, have received a great deal of 
attention because of their practical importance in a variety 
of problems, especially in control theory. The solutions of 
these types of equations are required in 

(i) solving by finite difference discretization, some 
boundary value problems in partial differential 
equations which occur, for example, in potential 
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theory or when finding the stress in a helical 
spring [ 173 , 211 , 212 ], 

(ii) the analysis of beam gridworks with various boundary 
condit i ons [ 239 ] , 

(iii) the study of certain type of linear ordinary 

differential system with constant coefficients [l84], 

(iv) the investigation of the stability of time-invariant 
systems [170,223], 

(v) the calculation of (jiadratic performance indices for 
linear time-invariant systems [ 169,223], 

(vi) the calculation of mean-square functionals of linear 
time-invariant systems [224], 

(vii) the calculation of a large class of functionals of 

the time and frequency response of a linear, constant 
coefficient dynamical system [240], 

Cviii) estimating shoots and settling times and deriving 
pseudo-optimal control policies for the vector u 
which will return the system with control to 
equilibrium as quickly as possible following an 
initial disturbance [254], 

(ix) the construction of Luenberger observers for linear 
time-invariant multivariable systems [ 236 ], 

(x) sensitivity analysis of optimal linear control 
systems to small variations in parameters [l53], 

(xi) the evaluation of ISE (integral of the square error) 
of certain control systems [l78] , 



34 


(xii) simplifica'tion of large d 3 ^ainical systems [178}, and 
(xiii) the computation of inertia of certain type of matrices 
[42,246]. 

For some more applications one may refer Barnett and 
Storey [l60], Chidambara and Viswanadham [l78] and Rothschild 
and Jameson [264]. 

The Sylvester equation and its simpler more structured 
cousin., namely, the Lyapunov equation have been thoroughly 
studied and the literature in this area is quite extensive. 

Our objective is to give a comprehensive account of existence 
theorems, various methods of solutions, different types of 
solutions including explicit solutions and numerical solutions, 
sensitivity of solution, comparative studies made on several 
algorithms and some generalized forms of the matrix equation 
AX+XB=C. A guide to our survey in this area is the excellent 
treatment of Lancaster [23l] in this direction. Other papers 
worth noting in this connection are those of Kucera [228] who 
has presented a comprehensive th©ory of AX+XB=G and of Barnett 
and Storey [l60] concerning various solution methods for the 
Lyapuncvmatrix equation. 

¥e shall begin our survey with various existence theorems. 
The straightforward approach to solve (l.4.l) is to transform 
it to an equivalent vector form. To facilitate this study, 
we need the concepts of column string of a matrix [273] and the 
Kronecker product of two matrices (see, e.g. , Bellman [4, p.235] 
and Lancaster [l3, p.256]) . 
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IfX = (x..) sM , then the column string of X, •written 
13 in n 

cs(X) is defined as the mn-dimensional column vector 

T 

(x^^, . . . • * * *^m2» »^n" * * * 

Neudecker [25l] refers the column string of X as the vector of X 
denoted hy vec(X) « 


If A = (ajL^) s and B = ^^ij) ® ^,q Kronecker 

product (also known as tensor product or direct product) of A 
and B, denoted by A(g! B, is defined to be the partitioned 
matrix (a.-.B) sM (i=l,...,m, o=l,...,n). 

J-J 

Some elementary but interesting properties of the Kronecker 
product follcv^^. In (l) and (3 ) , A and B may be rectangular 
matrices of any oixier whereas in the rOTaining it is assumed 
that A s B s M^. 

(1) (A0 B)^ = A^ B^ 

(2) (A^B)“^ = A”^ if A and B are nonsingular 

(3) rank(A(SS B) = rank ( A) rank (B) 

(4) tr(A0 B) = tr(A)tr(B) 

(5) det(A0 B) = (det(A))^(det(B))°^ 


The following important classical resiuLt on the eigenvalues 

of the Kronecker products is given in Lancaster [l3, p.259]. 

P • - 

Let 0(x,y) = 2 c,-^x^^ be a pol3momial in x and y for 

i,d=0 

certain complex numbers Then for A s and B s the 


Kronecker polynomial 0(A,B) is defined as 2 c^ 

i,d=0 

0 0 

where as usual A = 3^ and B = Then if 


(X 


m 


are 
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the eigenvalues of A and are those of B, then the 

eigenvalues of 0(A,B) are the mn numbers i=l,,,.,m, 

0=1,..,, n. In particular the eigenvalues of A^^B are 
i=l, . , . ,m, 0=1 » • . • jH and consequently P(A(^B) = P(a) P(B) , 

The following result connecting the column string and the 
Kronecker product is given in Vetter [273] (also see Marcus and 
Mine [l6, p.9]) . If A, B, X are matrices of any order so that 
AXB is defined then 

cs(AXB) = (B^@A)cs(X). (1.4.2) 

Hence, by making use of this formula, we find that the 
Sylvester equation ( 1,4.1) assumes an equivalent form 

Gx = c (1.4, 3) 

where 

G= (I„@A) + X = cs(X) , c = cs(C) . 

Since the eigenvalues of G are a.+p., i=l,...,m, o=l>»*-»n 

X J 

where are the eigenvalues of A and 

those of B, one has the following basic existence and uniqueness 
theorem about the Sylvester equation. 

THEOREM 1.4.1. The solution of the Sylvester equation 
AX+XB=C exists and is unique iff A and -B do not have a common 
eigenvalue . 

Alternative proofs for this important theorem are 
available in the classical texts of Gantmacher [7, Vol.l] and 
MacDuffee [l5] . For a simpler and interesting proof, see 
Ostrowski and Schneider [71]. 
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In view of (l.4,3)» we conclude that (l.4,l) has a 
solution iff 

rank(G [ cs(C)) = rank(G) , (1,4.4) 

Analogous existence theorem is given by Roth [262] even for 
the more general linear matrix equation 

S A.X.B, = C (1.4.5) 

X X 1 1 

and in particular for 

E = C. (1.4.6) 

One may feel the difficulty in expressing the condition given 
in (1,4,4) in terms of A, B, C explicitly. Moreover, the 

matrix G involved here is of order mnxmn. In another paper, 

involving only matrices of order (m+n) , Roth [ 263 ] has given a 
necessary and sufficient condition for (l.4,l) to have a 
solution. The result is as follows. 

THEOREM 1.4.2. Let A s Mj^, B e M^ and C s Then 

there -exists X s ]y^ ^ such that AX+XB=C iff the matrices g] 
[q are similar. 

In facty Roth has proved this theorem only for the case m=n, 
though this holds in general. Another result associated with 
the above theorem says that for A e B g and C s 

there exist X s M^ ^ and Y s ^ such that AX+YB=C’ iff 
[q and [q _g] are equivalent in the sense that one can be 
obtained from the other by means of a sequence of elementaiy 
row and column operations. 
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The above two res\ilts, known as Roth’s similarity theorem 
and Roth’s equivalence theorem, have been discussed by several 
authors [148,187,188,195,196,201,218,228] either by giving 
alternative proof or by extending these results over certain 
rings, 

Johnson and Newman [216] proved that if A and B are 

A C 

diagonalizable then AX+XB=C has a solution iff [q __g] 
is diagonalizable. It may be noted that this result is an 
immediate corollary of Roth' s similarity theorem. 

In [228], Kucera obtained two existence theorems regarding 
the equation (l,4,l), one involving linear transformations and 
the other one being Roth’ s similarity theorem. Criterion for 
the existence of a solution of the Sylvester equation is given 
in terms of pseudo-inverse of matrices by Boullion and Poole 
[ 175 ]. Some more remarks on existence and uniqueness theorems 

t 

are given in Lancaster [23l]. 

Next we shall deal with various methods for solving the 
Sylvester equation. The basic method is to solve the 
transformed equation (1.4.3) by any standard method (see e.g., 
Westlake [ 21 ] and Young [23]) . This direct approach known as 
full inversion method [264] is satisfactory only for small 
values of m and n, say less than 10. Since the computer time 
and storage requirements of this method increase with (mn) 
and (mn) respectively, this method is of no value even for 
modest values of m and n. Thus in order to solve AX+XB=C 
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many authors have suggested special methods which involve 
operations only with matrices of order m and n. 

The best known such methods are based on the transformation 
of A and B to simple forms (for example, companion form, Jordan 
form, Hessenberg form, Schwarz form [223] etc.) via similarity 
transformations 

A = U'^AU, S = V"^BV. (1.4.7) 

Now (l.4,l) reduces to 

AY + YB = C (1.4.8) 

where Y = U“^XV and C = U“^CV, In view of the special 
structures of A and B, solving (1.4.8) may be shown to be simple 
and therefrom X can be obtained easily. 

The well-established algorithm of Bartels and Stewart Il64] 
and its recent improvement by Golub, Nash and Van Loan [l92] are 
examples of the above method. Key to the technique of the 
Bairtels-Stewart algorithm is the Schur reduction to triangular 
form by orthogonal similarity transformations using standard 
procedures of Householder's method and the OR algorithm. More 
specifically, A is in block lower (upper) triangular form 
and B is in block upper (lower) triangular form where in both 
cases each block is of order at most two. Now the structure 
of transformed equation (1.4.8) allows the solution process 
to be decoupled and reduced to a succession of the Sylvester 
equation problems with matrices of order 2x2 at most. Then 
each such equation can be solved by direct approach, that is 
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by full inversion method explained earlier. A special feature 
of the Bartels-Stewart algorithm is that it contains a package 
of FORTRAN IV program with brief description of the subroutines. 

The above algorithm requires approximately (2+4cr) (m^+n^) + 
^(mn^+nm^) multiplications where <y is the number of OR steps 
involved in the process. The storage requirement for this 

p p 

method is 2m +2n +mn locations. 

Since the computer time required mamLy depends on the 
number of multiplicative operations, only multiplications are 
considered in the operation count. Moreover all the counts 
are made on the assumption that the matrices concerned are all 
real. 

Recently, Golub et al. [l92] and Enright [l85] proposed 
independently a modification over the Bart els -Stewart algorithm 
by keeping A in Hessenberg form itself. The rest of the 
procedure is as in the Bartel s-Stewart algorithm. Assuming 
that the Schur reducrtion of B requires lOn^ operations, the work 
count for the new method, called the Hessenberg-Schur algorithm 
[ 192 ], has been calculated as -^m^+lOn^+^m^n+Imn^. Although 
the storage requirements of this modified method exceeds that 
of the Bartels-Stewart algorithm by m^ locations, it has been 
shown in [l92] that it is considerably faster than its 
nearest competitor, the Bartels-Stewart algorithm. The 
stability of the new method is demonstrated throu^ a roundoff 
error analysis and supported by numerical tests. 
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The transformation approach of solving the Sylvester 
equation has been discussed by several authors . Gantmacher 
[7, V0I.I5 Chapter VIIl] uses this approach to study the 
general solution of (l.4.l) . Molinari [249] considers 
A and B in companion form and estimates the number of operations 
required as about 2(m+n) (m^+n^) and storage needed as 6m^+6m 
locations, Thou^ this procedure is countwise favourable, 
there is a possibility of obtaining ill-conditioned 
transformation matrices [192], The approach of Guidorzi 
[194] deals with general canonical forms and it is suggested 
that the transformed equation (1.4.8) may be solved using 
direct approach by triangul arizing the almost triangular matrix 
of order mn. Kreisselmeier' s method [22?] involves a 
transformation of B (or A^) to Hessenberg form, an mxm 
(or nxn) matrix inversion and a recursive algorithm. 

Treating A and B as Jordan canonical forms, Ma [239] obtains 
a solution involving a finite double matrix series. 

The transformation technique can also be used to study 
certain explicit solutions of AX+XB=C. In (1.4.8), if 
Y = (yjj)* C = (Cj_^), A = diag(a2,...,ajn) and B = diag(p2 » . . . ,^j^) 
then it follows that 

provided 0. This shows that if A and B are 

diagonalizable and A and -B do not have a common eigenvalue 
then the unique solution of (1.4.1) can be expressed in terms 
of eigenvalues and eigenvectors of A and B [l73»215]. 
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It is, however, possible to obtain an explicit solution 
without recourse to diagonal ization. It has been established 
by Bickley and McNamee [l73] that if the characteristic 
pol 3 niomial of B is x^-p 2 _x^“^+. ,,+(-!) ^p^^, then the solution to 
the Sylvester equation is given explicitly as 

X = (1.4.10) 

where 

G = + ... + p^Ijj^ (1.4.11) 

and 

= B^ - p^B^“^ + ... + (-l)^P 2 ,^ji» r=0,l, , . .,n.CL.4.12) 

Similarly X can be expressed depending on the characteristic 
polynomial of A. Explicit solutions of this nature involving 
the coefficients of characteristic polynomial of A (or B) and 
an inversion of a matrix of order n (or m) are found with 
sli^t variations in Barnett [l54], Jameson [215], Lu [234] 
and Sestopal [266]. Bickart [l72] obtains an explicit 
solution involving coefficients of annihilating polynomials 
of A or B. Expressions for the solution of ( 1,4.1) involving 
the matrix [q __g] and the characteristic polynomials of A and B 
are available in Jones [220, 22l]. Muller [250] uses the 
resultant of the characteristic polynomials of A and -B to 
express the solution of (l.4.l). Hartwig [l99] derives a 
finite series solution for X in case 0 can be expressed as a' 
finite series in terms of matrices related to A and B. In 
another paper [200], Hartwig applies the theory of generalized 
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inverses to obtain explicit solutions for the Sylvester equation 
in case A and -B have a common eigenvalue. This singular case 
is studied in Ma [239] also. 

For details of solutions in terms of component matrices 
[l3, p.174] and solutions in terms of adjoint matrices refer 
Lancaster [23l]. 

The equation AX+XB=C can be expressed [27O] in the 


equivalent 

form 



X - UXV = ¥ 

(1.4.13) 

with 

U = (ql _A)*"^(qI +A) 

(1.4.14) 



(1.4,15) 

and 

¥ = -2q(qVA)"^C(qIjj-B)"^ 

(1.4.16) 


for a suitable nonzero constant q. The form (1.4.13) 
suggests an infinite series solution to (1.4.1) given by 

00 

X = E (1.4.17) 

d=i 

In fact, the above series converges [231] iff P(U)P(V) < 1 
and in this case the limit represents the unique solution 
of the Sylvester equation. Based on the representation 
(l.4.17)> Smith [270] proposes the quadratic convergence 
iterative scheme 

^k+1 = » ^0=^ (1.4.18) 

to obtain the numerical solution of (l.4,l) , assuming that 
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A and B are stable. Under this assTJmption, P(U) < 1 and 
P(V) < 1 for any q > 0, 


Another interesting and significant form of explicit 
solution is the one expressed in terms of integrals. Suppose 
and contours enclosing in their interiors respectively 

all and jS^ such that ^ 0 and let = C where 

{a .} and {P •} are the eigenvalues of A and B respectively and 

O 

P, =^/ (aIjn-A)”^da (1.4.19) 

^ 2m 1 -r*. 

■^A 

Pp = ^ / (pIn-B)“^dp. (1.4.20) 

^B 


Then 


X = - 



S S 


'^C(pin-B) 

a+P 


dp dec 


(1.4.21) 


is a solution of (1.4.1) . Moreover, if Re(a .-t-p .) < 0 for all 

X J 

the enclosed eigenvalues, then (l,4.2l) can be expressed as 


X = - / e-^'^Ce®'^ dt. (1.4.22) 

0 

If a.+p . 0 for all i and j, i.e., if cr(A) f) cr(-B) = 0, the 

empty set, where c3f(A) denotes the spectrum of A, then (1.4.21) 
represents the unique solution of (l,4.l) and in this case 
Pa = Pb = ^ and hence the requirement Pj^UPg = C is 
automatically satisfied. If Re(a+P) < 0 for all a «s cj(A) 
and p = cf(B) (which holds in particular if A and B are stable), 
then (l,4.22) is the unique solution of (l,4,l) , These 
results involving integrals are due to Krein [l2] and they 
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are quoted in Kucera [228] and discussed extensively in 
Lancaster [23l]. 

Using differential equations. Bellman [4, p.l79] proves 
that if the integral given in (1.4.22) exists for all C, 
then it represents the unique solution of AX+XB=C. Many 
authors use the formula (1.4.22) to obtain numerical solutions 
of the Sylvester and Lyapunov equations. Regarding this, 
references 'will be given later. 

A ntjmber of other papers dealing 'with the methods of 
solution of AX+XB=C may be recognized from their titles. 

Po'wers [259] proposes a method using two ideas familiar to 
practitioners of control theory? controllability of a matrix 
and ' tearing’ . This method seems to be not so favourable 
since it involves O(n^) operations, for m=n. Bar-Ness and 
Lan^olz £l52] investigates the solution of (l.4.l) as an 
eigenvalue problem of [q . Ingraham and Trimble [214] 
reduce the problem of solving (l.4.l) to a problem in 
polynomial congruences . 

There may be situations in which we require certain 
specific type of solution to AX+XB=C. For example, finding 
nonsingular solution arises directly in the construction of an 
observer for linear time-invariant system [236], Hearon [202] 
gives a necessary and sufficient condition for the Sylvester 
equation to possess nonsingular solution. In view of 
assignment problems, Bouillon and Poole [l75] investigate the 
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existence of integral solution whereas Kabe [222] considers the 
nonnegative and integral solutions. 

In [271], Snyders and Zakai discuss nonnegative definite 
solutions of (1.4.1) with B = A* and C = -DD*. If AX+XB=C 
is not consistent, the reasonable compromise is to find a 
least square solutions that is to find X which minimizes the 
Frobenius norm of the residual matrix AX+XB-C and this problem 
has been treated in Lovass-Nagy and Powers [253]. 

For solving (1.4,1) , Milan! [24?] presents an iterative 
procedure which explores possibilities of partitioning the 
coefficient matrices A, B and C for decomposition of the 
equation into sufficiently lower dimension equations v/hich 
are satisfactorily solved by direct method. Varah [272] 
investigates the feasibility of the iterative scheme 

^k+1 = ^ (1.4.23) 

for solving (1.4.1) . It is easily shown that the above 
method converges for any initial guesss X^ iff p(A”^)P(B) <1. 
The sensitivity aspects of the solution of (l.4.l) have been 
studied in Golub, Nash and Van Loan [l92] and Varah [272] . 

Next, we shall make a brief mention of studies on some 
generalized forms and special cases of the Sylvester equation. 
The general matrix equation Z ^ has already been 

referred in connection with existence theorems. The history 
of this general matrix equation is given in MacDuffee [l5]. 

The discussion made by Lancaster [23l] on this equation is 
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supplemented by Wimmer and Ziebur [276], Vetter [273] deals 
with the yet more general equation 

§ J = F. (1.4.24) 

In particTolar, many authors have concentrated on the 
matrix equation 

AXB + CXD = E. (1.4.25) 

The above equation occurs in the MINQUE theory of estimating 
covariance components in a covariance components model [248]. 

The general two-layer gridwork problems with different 
boundary conditions all lead to the matrix equation of the 
form (1.4.25) [238]. Moreover, the solution of this type of 
equation is required in linear parametric estimation theory of 
normal multivariate statistical analysis [265] and in the 
nmerical solution of certain implicit ordinary differential 
e qu at ions [ 186 ] . 

Mitra [248] describes a method of solution to (1.4.25) 
using canonical representation of a singular pencil studied 
in Ganimacher [8]. Baksalary and Kala [149] give a necessary 
and sufficient condition for (1.4.25) to be consistent, 
together with a representation of its general solution in terms 
of g-inverses for a consistent case. Epton [l86] extends 
the idea of Enright [l85], referred earlier, to solve (1.4.25). 
The equation (l,4.25) is considered, also in Golub et al. [l92], 
Jones [ 219 ] and Scobey and Kabe [ 265 ]. 

Another generalization of the Sylvester equation is 
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the matrix Riccati equation (or matrix quadratic equation) 

§ 

XDX +AX+XB~C=0. (1.4.26) 

Interest in this type of equation stems again from its wide 
application in electrical engineering problems [l46,255]. 

This equation has been studied in Anderson [l46], Beavers and 
Denman [165*166], Campbell and Daughtry [l76], Coppel [179]* 
Daughtry [l80], Jones [217]* Jones [220], Kleinman [226], 

Daub [232], Martensson [242], Meyer [243,244], Potter [255] 
and Wimmer [275]. For additional references on this equation, 
one may consult the very good source book of Barnett [1] on 
matrices in control theory. 

In view of the applications in quantum mechanics, 
scattering theory and in the study of similarity of operators 
[>190], a great deal of work has been done on the operator 
equations corresponding to (1.4.1) and (1.4.6) by Apostol [l47]. 
Freeman [l90]* Goldstein [l9l], Luenberger [235], Lumer and 
Rosenblum [237] and Rosenblimi [260,26l]. 

Several authors, for example, Foulkes [l89]* Gantmacher 
[7* Vol.l], Hartwig [l98], MacDuffee [l5] and Parker [253] 
have studied the equation (l.4.l) with C=0. In addition, 
if B=-A then the problem of solving (l,4.l) reduces to find all 
matrices X commuting with A, This problem dates back to 
Frobenius and has been solved by many authors [228]. For 
some references on this study see Bellman [4, p,30]. Another 
special case of (l.4,l) is AX=C. 
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The most important special case of the Sylvester equation 
is the one mentioned in the beginning of this section. It is 
the Lyapimov matrix equation 

AX + XA* = C, (1.4.27) 

which is also known as the continuous Lyapunov matrix 
equation;, it being associated with continuous -time linear 
system x = Ax. There have been many papers on the Lyapunov 
matrix equation, especially on the numerical solution of 
the real matrix equation 

AX + XA^ = C (1.4.28) 

where A, C s M^^CR) . We have seen earlier that the solution 
of (l.4,28) with C = -I can effectively be used to determine 
the stability of a real matrix A. Here the essential problem 
is to solve 

AX + XA^ = - I (1.4.29) 

admitting that there is a symmetric solution X [223]. Now 
the system (1.4.29) represents n(n+l)/2 linear equations 
for the n(n+l)/2 unknown elements in X, The systematic 
way of constructing the enlarged system of order n(n+l)/2 
is given in Bingulac [174] , Chen and Shieh [l77] and MacFarlane 
[240]. Barnett and Storey [l57] have shown that the number 
of equations can be reduced to n(n-l)/2 by introducing a 
skew-symmetric matrix. It has been shown by calculations 
[l58] that this reduction in the number of equations is 
worthwhile for moderate values of n, say lying between 10 and 50. 
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Now We shall investigate various methods of solutions to 
the Lyapimov matrix equation. Smith [ 269 ] derives an explicit 
expression for X satisfying (1,4.27) and Ziedan [278] develops 
an explicit solution for (l,4.28) involving Schwarz form of A. 

Since the direct approach suffers most with respect to 
increase of computer time, the transform method, for instance, 
the Bart els -Stewart algorithm, is preferable. Power [ 256 ] 
describes an iterative method for solving (l.4,28) when A is 
given in Schwarz or Routh canonical form. Howland and Senez 
[ 213 ] have described a method for solving (l.4,29) when A is in 
upper Hessenberg form and this procedure has been extended to 
complex case by Meyer-Spasche [245]. 

Although the solution of the Lyapunov matrix equation is 
used to solve the stability problem, in many applications, the 
solution of (1.4,28) is required only for the case when A is 
stable. In such a situation there are many efficient algorithms. 
Barnett and Storey [l59] remark that the most promising method 
from the practical point of view is the one based on the 
iterative scheme given in (1,4,18) suggested by Smith [270], 

This method has been discussed by another Smith [ 267 ] also. 

In the light of numerical quadrature, Davison and Man 

[l83] have proposed the following iterative procedure to solve 

(1.4,28) when A is stable J 

k k 

^k+l = \ , XQ=-hC 


( 1 . 4 . 30 ) 




Q = (I - |a + + |a + ^A^) . (1.4.31) 

In [24l], Man generalizes the above method to obtain a 
hi^-order iterative scheme, showing that the optimm order 
is two on the basis of computer time. The comparative 
table showing the number of multiplications necessary and 
storage needed for various numerical algorithms for solving 
(1.4.28) will be presented later. 

An interesting class of iterative methods for solving 
( 1 . 4 , 1 ) when A and B are stable, has been proposed by 
Hoskins, Meek and Walton [205] and it is extensively discussed 
in Hoskins, Meek and Walton [206,207,209], Hoskins, Pathan and 
Walton [210], Hoskins and V/alton [211, 212] and Walton [274]. 
The essential idea behind this class of methods is as 
Let, for k=0,l,2, . . . 

■^k+1 ~ “k^k ^k^k * •^o"'^ 

®k+l " Pk®k^» ®o"® 

and 

^k+1 = “k*^ Pk^kVk^» *^0=^ 

By induction, it follows that 

for k=®0 ,1,2,.,. 

The parameters aj^ and are chosen such that both Aj^ 
converge to -Ijjj and respectively. Consequently, 
solution of ( 1 , 4 , 1 ) becomes 

X = (-V2) . 


follows. 

( 1 . 4 . 32 ) 

(1.4.33) 

(1.4.34) 

(1.4.35) 

and Bj^ 
the 

( 1 . 4 . 36 ) 
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The classical choice of parameters is 

“k = Pk = 1/2 (1.4.37) 

and this choice is associated with the well-known Newton's 
process. In order to accelerate the convergence it has been 
suggested [205,212] to take 


where 


2ai 


“k = 




'5’ ^k ^k^k“k 


= min( llAj^ir^, 
bj^ = maxCllAj^ll, 


(1.4.38) 

(1.4.39) 

(1.4.40) 


with any convenient norm. For the algorithm described by 
( 1 . 4 , 32 ) -( 1 , 4 , 37 ) ? the storage requirement is 2m^+n^+2mn for 
m >_ n and m +2n +2mn otherwise | the number of multiplications 
required per iteration is approximately ■^(m^+n^)+m^n+mn^. 

It is claimed by Hoskins et al. [205] that this algorithm 
practically converges approximately in five iterations to 
achieve a solution with an accuracy of seven decimal places. 


In [208], the above class of iterative procedure is 
considered for the Lyapunov matrix equation (1.4, 28). In 

this case the step corresponding to (1.4,33) will not be 
there and in (1.4.34) and (1.4.35), is to be replaced by 
A^. In the same paper, the following choice of and 
has been proposed for accelerating the convergence: 

2aj^ 


a. 








(1-4.41) 
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along with 

^+1 

- 1-ej^, bj^^^ = 

: l+Sj^ 

(1.4.42) 

where 

e, = 

^^k” y ^k^k|2 


( 1 . 4 . 43 ) 


k 



with 

= 

b„ = 

llAll. 

( 1 . 4 . 44 ) 


Poin'ting 0111 : "bhat the method, works for the last mentioned 
choice in general, only when the spectrum of A is real 
Barraud [ 163 ] has suggested the choice 

With 

in order to cover the complex spectrum case as well. 

Earlier, Beavers and Denman £ 167 ] (also Denman and 
Beavers [54]) have described a method to solve the Lyapunov 
matrix equation (1.4.28) using the concept of matrix sign 
function which in fact is associated with an iterative 
scheme of the form 

\+l “ 2 W » A^=A. ( 1 . 4 . 47 ) 

In the course of solving the matrix differential equation 
X = AX + XB - C, X(0)=Z (1.4.48) 

the solution of AX+XB=C is required [162,182,207,230], In 
this context, Davison [l82] extends the method of Davison and 
Man [l83] based on the numerical qioadrature to solve (l.4.l) 


( 1 . 4 . 45 ) 

(1.4.46) 



54 


when A and B are stable and for the same case Hoskins, Meek 

and Walton [20?] suggest the following values of to 

implement the iterative scheme (l,4,32)-(l,4,36) s 

“k {tr(Sj^)tr(S^^) - p tr(Sj^) }/Yj^ (1.4.49) 

Pk = (■fcr(s2)tr(S“^) - p tr(S^)}/Yj^ (1.4.50) 

^k “ tr(S^)tr(S”^) - p^ (1.4.51) 

where 

S^=A^ and p=m if P(Aj^ ) > P(B^) and P(aJJ^) > P(b"^) 

^k^\ and p=n if P(Bj^ ) > P(Aj^) and P(b”^) > ^(\^) 

and otherwise “ 1/2. 

Before noticing such a choice of aj^, given in 
(1.4.49)-(1.4.51) we have developed independently such a 
choice with a slight modification and along with a generalization 
to solve the Lyapunov matrix equations (l,4.28) and (1.4.27) . 
Moreover, by specializing to the Lyapunov matrix equation, 
it may be observed that the choice suggested by Hoskins et al. 
in ( 1 . 4 . 49 ) “( 1 . 4 . 52 ) may not work in general when the spectrum 
of A is complex. To substantiate this statement we give 
theoretical counterexamples in Chapter 4 where we present 
our algorithms with detailed analysis. 

Several papers are devoted to the comparative study of 
numerical methods for solving the Lyapunov matrix equation 
(1.4.28), for example, Belanger and McGillivray [I 68 ], 

Hagander [197]* Pace and Barnett [252] and Rothschild and 
Jameson [264]. In these papers, the algorithms adjudged 
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to be more efficient are those due to Bartels and Stewart [l64]j 
Jameson [215] and anith [270], Golub et al. [l92] remark 
that the Hessenberg-Schur algorithm offers no advantage over 
the Bart els -Stewart method for the Lyapunov matrix equation. 

Now we shall present the operation coimts and storage needed 
in solving (1.4.28) by some of the standard methods that we have 
seen earlier. For the first four methods mentioned in the tabl^ 
the matrix A s Mj^(IR) is assumed to be stable while for the last 
three methods this is not so. In the table, k denotes the 
number of iterations required for the method to converge 
numerically and the data have been collected from the literature. 


TABLE 1.4.1 

Comparison of numerical methods for solving 
the Lyapunov matrix equation 


S.No. 

Method 

Approximate 
number of 
multiplications 

Approximate 

storage 

locations 

1. 

Davison and Man [183] 

(2.5k+4)n^ 

4n^ 

2. 

Smith £270] 

(also refer [267]) 

2.5(k+l)n^ 

2.5n^ 

3. 

Man [241] 

36. 5n^ 

4n^ 

4. 

Hoskins. Meek and 
Walton [208] 

lOn^k/3 

CM 

5. 

Jameson [215] 

(also refer [264]^ 

O(n^) 

not available 

6. 

Molinari [249] 

5n5 

kr? 

7. 

Bartels and 

Stewart [l64] 

(2+4cy)n^+3 .5n^ 

where a is^the 
number of OR 
steps required. 

cvl 

n 
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It seems that except probably in No. 4 and 3 } in all 
other methods C and X are assumed to be symmetric in calculating 
the number of operations. 

Returning to some theoretical development on the Sylvester 
and Lyapunov equations, one can see expressions for botinds on 
solution of (1.4.1) in [231,270]. Boionds for the extreme 
eigenvalues of the solution matrix X of the Lyapunov matrix 
equations (1.4.27) and (1.4.28) for the case when A is stable 
and C is negative definite are discussed in [229,231,268]. 

The solvability of simultaneous Lyapionov matrix equations 
AX + XA* = XA + a'^X = I (1.4.53) 

where X is assumed to be Hermitian is a problem proposed by 
Taussky [82] and it has been studied by Davis [l8l], Gottlieb 
and Gunzburger [l93] and Barker [l50,15l]. 


Finally, v/e shall mention a few words about the so-called 
discrete Lyapunov matrix equation 

a'^XA - X = Q (1.4.54) 

associated with the linear discrete-time system 


= Ax, 


(1.4.55) 


Vi -k- 

It has been shown by Power [257] that the continuous and 
discrete Lyapunov matrix equations may be converted one to 
the other through the well-known Cayley transform referred 
in Section 1.2. The solution of (1.4.54) has important 
applications in the design of linear discrete systems [l7l] . 
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Details of methods of solution of this equation may be found 
in Barnett [155], Barraud £l6l], Berger £l7l], Kitagawa [ 225 ] > 
Power £258], Smith £269] and Young £277]. 

The equation 

2 a. .A%(A^)3 = C (1.4.56) 

which contains both the continuous and discrete Lyapunov 
matrix equations as special cases has been discussed by 
Barnett £l56]. Techniques for solving the extended Lyapunov 
matrix equations 

AX + XA^-20X=C (1.4.57) 

and 

P"^AXA^ - X = C (1.4.58) 

are presented by Heinen £203,204]. 

In the thesis, we also draw attention to certain projection 
and residual projection methods which can be effectively used 
for solving the Lyapunov and Sylvester equations even in 
singular cases. 



2. NORMAL MATRICES AND ANGULARITY 


2,1. Introduction 

The purpose of this chapter is to introduce and study a 
notion of angularity as a generalization of the well-known 
concept of inertia of a matrix. The angularity characterizes 
the distribution of arguments of eigenvalues of a matrix and 
we give the formal definition of angularity in the section 
to follow. 

In practical problems it is often insufficient to know 
merely that a linear system is asymptotically stable [2]. It 
is important to guarantee a certain degree of stability that 
can ensure a better performance of the transient process. 

This concept, known as relative stability [2,19] has mainly 
the following two specifications. 

Suppose and S2 are the two linear systems x = Ax and 
X = Bx respectively. If all the eigenvalues of A lie in the 
half plane Re(z) < a2_ and those of B in the half plane ReCz) < a2j 
the system S2 may be regarded as more stable than S^ if 
a2 < oci < 0. The second specification of relative stability 
is related to certain angular sectors of the complex plane. 

If all the eigenvalues of A lie in the sector |arg(-z)j < (3^ 
and those of B in the sector larg(-z)| < ^ 2 * system $2 
may be regarded as having a better damping than if 
0 < ^2 ^ ^ other words, the oscillations in the 

transient response will be less in S2 than in (see Harden 
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relation with Hermitian matrices , almost all angularity theorems 
we prove in the thesis are concerned with normal matrices. 

2.2. Angularity of a Matrix 

We recall from Section 1.2 that the inertia In(A) of A s 
is the ordered triple (tcCa), v(A), 5(A)), the entries denoting 
the total number of eigenvalues of A with positive, negative, 
and zero real parts, respectively. Geometrically speaking, 

A has It; ( a) eigenvalues in the open right half plane, v (a) 
eigenvalues in the open left half plane and ^(A) eigenvalues on 
the imaginary axis, all counting multiplicities. The inertia 
In(A) thus depends on the distribution of arguments of 
eigenvalues of A. This dependence is complete in the case of 
angularity e[A] of A which we define in this section. Indeed, 
the aim of the present section is to explain and illustrate 
the concept of angularity of a matrix. In order to define 
©[a], following Cain [36] we shall first define the ray space 
of the complex plane C. 

DEFINITION 2.2.1. The ray space of C is defined as the 

set fi = {{O}} U 0 £ © ^ Six} where = (0,°°) . 

A general element of Q is called a ray and is denoted by w , 

10 

oj = {0} being called a null ray and m = e (O £ 9 < 2'Tt) 

being a proper ray. 

It is understood that in i = and 6 will always 

We note that the restriction 0 £ 9 < 27 i: is of no 


be real. 
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particular advantage in specifying the rays and hence in the 

^ ^ *4* 2. krc ) 

sequel, the ray w = e IR^., where k is any integer will 

be identified with the ray m = e^%.+ . 

By an extended ray we mean a straight line through the 
origin dividing the complex plane into two half planes. 
Clearly, an extended ray is the union of null ray and two 
proper rays with 6 = a and 0 = a+n, a being arbitrary. We 
shall now introduce some notation. 

= {e^®3R+! |©1 < n/2} 

= {e^®nR_|_; n/2 < G < 3r^/2} 

n = f2\ {0} 
p ' 

Evidently the totality of the rays in n , n ^ , and n 

+ - o p 

refer respectively the open right half plane, open left half 
plane, imaginary axis and the origin deleted conplex plane. 

We can now give the definition of angularity of a matrix. 

DEFINITION 2.2.2. The angularity ©[A] of A c is a 
mapping from fi to IN, the set of nonnegative integers for 
which ©[a]j^ is the number of eigenvalues of A (counting 
multiplicities) lying on the ray oj . 

This notion of angularity has been introduced 
independently in Cain [36] and Rathore and Chetty [75]. 

It may be noted that 
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7c(A) 

= S 

3 

1 — r 
< 

(2,2.1) 


0) c 


''(A) 

= 2 


(2.2.2) 


0) s q 



^(A) 

= 2 

e[A] 

(2.2.3) 


U) s 

Ci) « 


Moreover A is stable 

iff 9[A]^ 

= 0 for all CO « . 



DEFINITION 2.2,3. Two matrices A and B c are said 
to be equiangular iff 0 [a] = 6[b] , i.e.y 
all (0 s q . 

Obviously if A and B are equiangular then they have the 
same inertia, without the converse being true in general. 

For example, A = [^ “■^] and B * [q have the same inertia. 
However, it is easily verified that they are not equiangular. 

In case if both A and B are Hermitian then they are equiangular 
iff they have the same inertia. 

As we have pointed out earlier, we deal primarily with 
normal matrices in proving angularity theorems. Various 
types of matrices with which we are familiar are, in fact, 
normal. Among these are the diagonal matrices, Hermitian 
matrices, skew-Hermitian matrices, unitary matrices and 
circulants [4, p,242]. In this connection we shall find 
a necessary condition for two normal matrices to be 
equiangular. In Section 2.4, we show that this necessary 
condition is also sufficient. 
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THEOREM 2,2.1. Let A and B be two equiangular normal 
matrices. Then there exists a nonsingular matrix C such thab 
C*AC = B. 


Proof. It is based on the spectral theorem. Let 

U and V be the unitary diagonalizers of the normal matrices 

A and B respectively. Since A and B are equiangular, 

without loss of generality (making use of a similarity 

transformation by a permutation matrix) we can assume that 

ia^ iar 


U 


*AU = diagCr^e ^,...,rjjje “, 05 ,..., 0 ) 


and 




*BV * dlag(R^e“l R^e“'”,0, . . . ,0) 


With v.f R. > 0 and a. real, j=l,,..,m. Let D be the diagonal 
11 1 . 

matrix whose i-th entry is (R./r.)"^' , j=l,...,m and the 

J J 

remaining entries are unity. Then D*(U*AU)D = V*BV. Hence 
with C = UDV^ we have C^AC = B. This completes the proof, AAA 


We say that In(A) <. In(B) iff tcCa) £ tc(B) ane ^(A) £ '^(B) . 
The same definition can be adopted even if A and B are of 
different size [4?]. Now we shall define an analogous 
relation for the angularity of two matrices. 


DEFINITION 2.2.4. If A and B are two square matrices, 
may be of different size, then we say that 0 [a] £ ©[b] iff 
0 [a] < 0[b] for all co c . 

From this definition it is clear that if ©[A] < ©[B] then 
In(A) < In(B) . However, the converse is not true in general. 
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Finally, we list some simple observations regarding the 
notion of angularity of a matrix. 

(i) 9 [a] = = gCs'^AS] = 6[kA], k being a positive 

number. 

(ii) ©[A“^] = 0 [a] = 6[A*], A denoting the conjugate of A, 

(iii) Although InCA) = In(A”^) , it is not true in general 
that 0 [a] = 6[A“^]. However if A is real then 
e[A] = e[A'^]. 

(iv) We know that In(A+A“^) = In(A) . However even for a 
real matrix such a relation is not true with the 
inertia replaced by angularity. 

(v) ©[AB] = 0[BA] 

(vi) If A is a quasi-diagonal matrix diagCA^^, , . . ,Ajj^) , 

Then ©[A] = ©[A-,] + , . . + 0[A_] for all w s S2 , 

CO (A) lii CO 

2.3. An Application of Angularity in Solving a Linear System 

In the introductory section of this chapter we mentioned 
that a knowledge of the angularity of a matrix is useful in 
the study of relative stability and in reconstructing the 
eigenvalues in case their magnitudes are known. In the 
present section we indicate one more application of the 
notion of angularity in solving a linear system 

Ax = b (2.3.1) 

where A is an nxn non-Hermitian normal matrix. 

Linear systems with this form of A occur in finite 
difference approximations to certain partial differential 
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equations [282], in the solution of transport equations by 
finite difference techniques [289], in solving banded Toeplitz 
matrices by circular decompositions [295] and possibly in many 
other applications , 

To motivate considering an iterative method to solve 
(2,3.1) , let us write it in the equivalent form 

(A*+A)x= (A*-A)x + 2b. (2.3.2) 

This form suggests the iterative scheme 

(A*+A)xj^+^ = (A*-A)xj^ + 2b (2.3.3) 

for solving (2,3.1). A necessary and sufficient condition 
for the convergence of this process may be expressed in terms 
of the angularity of A as in the following theorem. 

THEOREM 2,3.1. Suppose A is normal such that ^(A) =0 
and A Then for an arbitrary b s the iterative method 

(A*+A)x^^^_ = (A*-A)Xj^ 2b 

converges to the solution of Ax=b for any choice of iff 

^[a]^ = 0 for all oj s S2\(S^ U S2) (2.3.4) 

where 

= {e^®]R^: 0 < je| < 7t/4} (2.3.5) 

and 

= {e^®3R_^: 0 < |7i-e| < %/k} . (2.3.6) 

Since A is normal and 6(A)=0, A*+A is nonsingular. 
Hence the given iterative scheme can be rewritten in the form 

Vl " 2(A*+A)“^ (2.3.7) 
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1 

where B = (A*+A) (A*-A) . It is well known that for an 

arbitrary b s (2,3.7) converges to the solution of Ax=b 
for any initial iff P(B) < 1. Furthermore we know that 
if the eigenvalues of A are then the eigenvalues of 

B are Im(-Aj^)/Re(^^) , i=l,,.,,n. From this it is readily seen 
that /^(B) < 1 iff (2,3.4) holds. This completes the proof .AAA 

It may be noted that in the preceding theorem if A 
happens to be Hermitian then B=0 and hence there is no 
iterative character in the method proposed. Thus, if we 
know the angularity of A, assuming that A is non-Hermitian and 
normal, then we can decide whether to proceed or not iteratively 

I 

as described above for solving Ax=b. This method is 
particularly advantageous when 

e[A]^ = 0 for all u (2.3.8) 

In this case A*+A is positive definite and hence can be 
factorized in the form LL where L is lower triangular. 

Therefore each step in (2,3.3) can be solved by the Cholesky 
method [2l]. It may be noticed that the factorization LL 
has to be performed only once for the entire process, i.e. 

I 

ohly for the first iteration. For the comparative study of the 
operation counts, let us further assume that the system Ax-b is 
real. The proposed iterative-cum-Cholesky method requires 
n square roots, (n^*i9n'^+2n)/6 multiplications and (n^+6n -7n)/6 
additions in the first iteration [2l], These counts do not 
include the number of operations required in simplifying 
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the right side of (2.3.3) . For this simplification we require 
n^-n multiplications and n^-n additions, noting that the diagonal 
elements of A*-A are zero since A is assumed to be real. In 
fact, in the first iteration these operations do not come into 
the picture if we choose the initial vector as the zero vector. 

It is not difficult to verify that each one of the 

successive iterations requires n‘^-n multiplications and n‘^-n 

2 

additions in simplifying the right hand side vector; n +n 

2 

multiplications and n -n additions in solving the two triangular 
systems involved in the Cholesky method. Thus the total number 
of operations for the proposed method comes to n square roots, 
(n^+9n^+2n)/6 + 2(m-l)n^ multiplications and (n^+6n^-7n)/6 + 
2(m-l) (n^-n) additions, m being the number of iterations 
required for the numerical convergence to take place. On the 
other hand, for solving Ax=b by Gaussian elimination, we require 
(n^+3n^-n)/3 multiplications and (2n^+3n^"*5n)/6 additions. In 
practice, to compare the number of operations it is sufficient 
to consider only multiplicative operations. Thus we See that 
for large n, the proposed method requires about n^/6 
multiplications whereas Gaussian elimination requires about 
n^/3 multiplications. This shows that the proposed method 
may still be better than Gaussian elimination for large n. 
Moreover no pivoting is needed in the Cholesky method [2l]. 

In connection with the method suggested above for solving 
the linear system, it is desirable to find some necessary 
and sufficient condition for A to satisfy (2.3.8), i.e., for 
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all the eigenvalues of A to lie in the sector defined by 
(2.3.5). If A is normal then a necessary condition for A 
to satisfy (2.3.8) is that A^+A > 0. However it is not 
sufficient. If A is nonnormal then the positive definiteness 
of A*+A need not be even a necessary condition for A to satisfy 
(2.3.8). It is easily verified if we consider A = [q 

If A = (aj_j) is real then a sufficient condition for A 
to have all the eigenvalues in is 

Hi > ^ i=l,...,n. (2.3.9) 

d/^i 

However this is not necessary. This result is an immediate 
consequence of the fact that if (2,3.9) holds then all the n 
Ger^gorin discs [16, p.l46] of A lie in Sj. 

In the next theorem, we give a condition which is necessary 
as well as sufficient for any general (need not be normal or 
real) nxn matrix to have the property (2.3.8). 

THEOREM 2.3.2. Let A c M^. Then 9 [a]^ = 0 for all 
0) s iff the inertia of 2nx2n matrix is (2n,0,0) 

or, equivalently iff [ ^ stable. 

Proof. Let denote the characteristic values 

of A, We note that 

[a "1]©^= C say. 

where denotes the Kronecker product of matrices. Since 
the eigenvalues of are exp( ± iTc/4) (the notation 
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exp(x) is often used in place of e^, especially when x is a 

complicated expression) , by a property mentioned earlier 

regarding the eigenvalues of the Kronecker product of two 

matrices, the eigenvalues of C are v/2 k .exp( i ijt/4) , j=l,,..,n. 

0 

Consequently 5 we see that A satisfies (2.5.8) iff the eigenvalues 
of C lie in the open right half plane. This completes the 
proof of the theorem. SJJk 

In connection with Theorem 2.3.1, we shall now provide 

a simple necessary and sufficient condition for (2.3.4) to be 

true for any A s M . 

n 

THEOREM 2.3.3. Let A s Then e[A]^ = 0 for all 

o e n\(S^ U S^) iff In(A2) = (n,0,0) or, equivalently iff A^ 
is positive stable. 

Proof. If z s C, Re(z^) > 0 iff either 0 < |arg(z) 1 < n/h 
or 0 £ l7i;-arg(z) 1 < 7i/4. The theorem is clear now. AAA 

Finally, we note that the iterative scheme (2.3.3) may be 
rewritten in the equivalent form 

^k+1 " ^k ((A*+A)/2)”^(b-AXj^). (2.3.10) 

An implementation of (2.3.10) yields the residual b-AXj^ as a 
bonus at each iteration. This could be useful for many 
purposes such as in deciding the accuracy of the solution or 
the termination criterion. 

2.4. Angularity Theorems 

In this section we are concerned with the angularity of 
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a normal matrix and that of a matrix congruent to A, i.e. 
a matrix of the form C*AC where C is nonsingular. It is 
proved that if B and C are nonsingular then B*AB and C*AC 
are equiangular provided that the two matrices mentioned 
latter are normal. Some well-known inertia theorems (e.g. 
Sylvester’s law of inertia) have been deduced as corollaries 
of this main result. 


In order to prove our main res\ilt, we shall first prove 
a basic lemma. The method of proof closely resembles that 
used by Lancaster [l3, p.89] to prove the classical version 
of Sylvester’s law. 


LEMMA 2.4.1. If B*AB and C*AC are diagonal matrices 
where B and C are nonsingular, then 'n:(C*AC) = ^(B^AB) , 

Proof. Let B*AB = diagCA, , . . . ) and C*AC = 

diag(4^s . . . where without loss of generality we may assume 

that Re(4^) > ... > Re(?^) and Re(n^) > ... > , Re(z) 

denoting the real part of 2 . Suppose that Re(Ap) , Re(M-q) > 0 

and Re(^p+ 2 .)» Re(Pq+l) <0. We have to prove that p=q. 

Denoting the columns of B and C by ’’J-q > • 

Vq,...,Vj^ respectively, let us form the subspaces and W 2 
of spanned by . . . ,Uj^ and . . . ,v^ respectively . 

If q > p, then dim(Wq) + dim(W 2 ) = n-p+q > n where dim(W) 
stands for the dimension of the space W, Hence there exists 
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Then 


and also 


Re(x*Ax) = Re( . S < 0 

c)=p+i a 0 ~ 
q 2 

ReCx^Ax) = Re( £ jb, 1 M >0. 

k=l ^ ^ 


This contradiction leads to the conclusion q ^ p. By 
considering the subspaces spanned by u^,...,Up and 
a similar argument can be made to prove that p <, q. Hence 


We next come to our principal result of this section. 

THEOREM 2.4,1. If B*AB and C*AC are diagonal matrices 

where B and C are nonsingular, then 0[C*AC] = 0[B*AB]. 

Proof. Let A^ = A exp(i(S -a)), a being a real number. 

■““•is 

Appl 3 /'ing Lemma 2.4.1 to the diagonal matrices B*e A^^B 

and C*e ^^4^,0 where e>0, we find that B*Aq.B and C*Aq.C have 

the same number of eigenvalues in the open half plane 
i6 

{e 3R_^s - “ + e<0<~ + e}. Moreover, these two matrices 
have the same number of eigenvalues in the open right half 
plane. Since we can choose e sufficiently small such that 
both B*Aq;B and C*Aq;C have no eigenvalues in the regions 
{e^^lR^s - ^ < 0 < - § + e} and {e^®]R_^s § < 6 < 5 + e} we see 
that the two diagonal matrices B*A^B and C*A^C will have the 
same number of eigenvalues on the open upper half of the 
Imaginary axis. But this is equivalent to say that B*AB and 
C*AC have the same number of eigenvalues on the ray {e^3R^}. 


Since a is arbitrary the theor^i follows 
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Next, suppose that and C*AC are normal matrices where 

B and C are nonsingular. If U and V are the respective unitary 
diagonalizers of these normal matrices then hy the above theorem 
6[V*C*ACV] = 0[U*B*ABU] which implies that C*AC and B*AB are 
equiangular. Thus we arrive at 

COROLLARY 2.4,1, If B*AB and C*AC are normal matrices 
where B and C are nonsingular, then 0[C*AC] = 9[B*AB]. 

Also, we have the following two immediate corollaries, 
COROLLARY 2.4,2. If A and C*AC are diagonal where C is 
nonsingular, then 9[ C*AC] = ©[a]. 

COROLLARY 2,4.3. If A and C*AC are noimal where C is 
nonsingular, then ©[C*ACj = ©[a]. 

Another interesting corollary in this sequence is 
COROLLARY 2.4.4. If A and C are circulants and C is 
nonsingular then ©[C*ACj = 0[A]. 

The last corollary is an immediate consequence of the fact 
that the product of circulants is again a circulant and the 
circulants are always normal [l3, p.267]# We will study 
circulants in detail in the next chapter. 

REMARK 2.4.1. One can readily prove that Theorem 2.4.1 
and its first three corollaries are all equivalent to one 
another. 

We have already shown that Theorem 2.4.1 ==> Corollary 
2.4.1. Obviously Corollary 2.4.1 ==> Corollary 2.4.3. Since 
diagonal matrices are always normal, we have Corollary 2.4.3 
==> Corollary 2.4.2. Finally, if B*AB and C*AC are diagonal 
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then by viewing C*AC as (B'“^C)*B*AB(B“^C) it follows that 
Corollary 2.4.2 ==> Theorem 2.4.1. This completes the 
cycle. 

REMARK 2.4,2, Using the polar decomposition of C, 
Corollary 2.4.3 has been proved independently by Cain [36, 
Theorem 6.5]. 

Combining Theorem 2.2.1 and Corollary 2.4.3 we have the 
following theorem which gives the necessary and sufficient 
condition for two normal matrices to be equiangular. 

THEOREM 2,4.2. Two given normal matrices A and B are 
equiangular, iff there exists a nonsingular matrix C such that 
C*AC = B. 

We have seen, in Corollary 2.4.3» that if A and C*AC are 
normal, C being nonsingular, then they are equiangular. If 
the normality restriction on C*AC is removed in this, then 
9[C*AC] need not equal 9 [a] as may be easily .verified by taking 
A = [ J °] and C = _1] . 

In this connection, we ask ourselves the following 
question. Let A be normal and C be nonsingular such that 
©[C*AC] = ©[a]. Does it imply that C^AC is normal? 
Alternatively is there any nonsingular C such that C*AC is 
nonnormal, A normal and 0[C*AC] = ©[a]? Let us put this 
question in another simple form without involving angularity 

t 

explicitly. 
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Suppose A is normal, C is nonsingular and C*AC and A are 
equiangular. Let U be the unitary diagonalizer of A so that 
U*AU = D. By Schur' s theorem [4, p.202], there exists a 
unitary matrix V such that V*(C*AC)V = T where T is an upper 
triangular matrix. It is well known that C*AC is normal iff T 
is diagonal. If T is expressed as D + T where D is diagonal 
and T is strictly upper triangular, then from the hypothesis 
D and D are equiangular. Hence by virtue of Theorem 2,4,2, 
there exists a nonsingular matrix W such that W DW = D, Now 
we have V*C*UW*DWU*CV = D + T. Writing WU*CV as M, it reduces 

rs/ rvj 

M DM = D -f T. In view of this argument, the problem proposed 
in the last paragraph may be stated as follows.” 

Let D be a diagonal matrix and C be nonsingular such that 

C*DC = D + T, (2.4,1) 

T being strictly upper triangular. Does it imply that T = O'i* 

If either C or D is real then it can be easily shown that T = 0, 
If C is real then C = C , Hence by taking transpose on both 
sides of (2,4.1) , we have T = T’ implying T = 0. If D is real 
then by taking conjugate transpose on both sides of (2,4,l), 
it follows T = T* and hence T = 0. Therefore if T 0, then 
necessarily both C and D should be complex. It is not known 
to us whether in general (2.4.1) Implies T = 0. However in the 
following theorem we are able to answer affirmatively our 
question for the case n=2. For n> 3f the problem remains open. 
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THEOREM 2.4.3. Suppose D is a 2^2 diagonal matrix and 
C is a 2x2 nonsingular matrix such that C*DC = D + T, T being 
strictly upper triangular. Then T = 0. 

Let 



Hence from C^C = D + T, we have 


k[| 2] [4 2 j = [4 


■b d-* ^0 d2 


0 




where k = det(C) = ad-bc. From (2.4.2) we get 


kad^ = dd^ - ct 
kcd 2 = at - bdj 
kHd^ = -cd2 
kdd^ = ad^. 


(2.4.2) 


(2.4.3) 

(2.4.4) 

(2.4.5) 

(2.4.6) 


From (2.4.3) and (2.4.4) or directly from C*DC = D + T, we also get 

t = abd-]_ + cdd 2 . (2.4.7) 

Our aim is to prove that t=0 in any case. Now let us consider 
the case d-j^=0. Then from (2,4.3) it follows that c=0 or t=0. 

If c=0 then from (2.4.7) i t=0. If d2=0, by (2.4.5), bd-j_=0, 

I 

which by (2,4.7) implies that t=0. So we assume that d 2 d 2 /^ 0. 
Consequently, we see that jkl=l. Now from (2.4.6), a=kd. 
Substituting this in (2.4,3) gives that c=0 or t=0. If c=0 
then from (2.4.5), b=0 and hence from (2.4.7), t=0. Thus in 
any case t=0. This completes the proof. 


In view of the above theorem and Theorem 2.4.1, we have 
thus established for 2x2 matrices the f oll|w|lig result . 
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THEOREM 2.4,4. Let A be normal and C be nonsingular. 

Then C AC and A are equiangular iff C AC is normal. 

For matrices in n > 3 one part is always true and it 
will be interesting to investigate the other part also. 

Since the normality of C*AC is of fundamental importance 
in our angularity theorems, we proceed now to derive some 
results involving the normality of C*AC. As a first result, 
we shall now characterize the class of all nonsingular matrices 
C for which C AC is normal for every normal matrix A. This 
result will also find an application in proving one of our main 
theorems regarding the angularity-preserving linear 
transformations in the forthcoming chapter, 

THEOREM 2.4,5. For a nonsingular C, C^AC is normal for 
every normal matrix A, iff C is a nonzero scalar multiple of a 
unitary matrix. 

Proof. The "if" part is obvious. Conversely, if C*AC 
IS normal for every normal A, then ACC A = A CC A for every 
normal A. ¥e may choose A to be a diagonal matrix with any 
one of ■+*he diagonal entries as 1 and the remaining (n-l) 
diagonal entries as i (= yTI) , to show that CC* is diagonal. 
Further, if we choose A as the matrix E. .+K .+E, . -E., , j k 
where E^^ denotes the matrix whose (r,s) element is unity and 
all other elements are zero, it is easily seen that the o-th 
and k-th diagonal entries of CC are equal. This holds for 

■H' 

every pair of j and k. , Hence, noting that CC > 0, we have 
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CC = al, a>0. The theorem now follows immediately. AAA 

In the next theorem we give a sufficient condition for 

■X- 

C AC to be normal when A is normal, 

THEOREM 2.4.6. Let A be normal. Then C*AC is normal if 
A commutes with CC . 

Proof. Following Bellman [4, p.27], let us introduce the 
Jacobi bracket symbol [A,B] to denote AB-BA, the commutator of 
A and B. By definition, A is normal iff [a,A*] =0. Now 
if we write P = CC* we have AP=PA and also PA*=A*P since P*=P, 
By simple calculations we have [C*AC,C*A*C] = C*P[A,A*]C = 0 
and the result follows. AAA 


We remark that the above result holds even if C is 
singular or in general if C is an nxm matrix with, of course, 
A in Mj^. We study the angularity theorems with C as singular 
or rectangular in the following section. 


We also remark that the commutativity of A and CC , 
however, is not necessary for C*AC to be normal if A is given 

to be normal, A simple example to see this is A 
1 

■0 1 - 


c = [n h 


rl 0. 

Lq 


As an immediate consequence of Theorem 2.4.6 we have 
THEOREM 2.4,7. Let A be normal and P be positive definite 

such that PA = AP. Then for any C satisfying CC* = P, 

0[C*AC] = 0[a]. 


We next turn to the angularity analogue of Sylvester's law 
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of inertia. We have already seen the importance and various 
forms of this classical law in the review chapter. In 
literature, this lav/ is usually associated with Hermitian 
matrices only. In the course of studying its angularity 
analogue, we propose, now, to extend this association to a 
generalized class of Hermitian matrices. This consists of the 
so-called co-Hermitian matrices introduced by Ballantine 
[280,281] and defined as follows. 

DEFINITION 2.4,1, A matrix K is said to be 

co-Hermitian iff e K is Hermitian for some real 

Equivalently K e is co-Hermitian iff K = zH for some 
nonzero complex number z and H Hj^. We shall denote the 
class of all nxn co-Hermitian matrices by Kj^. Evidently, 
contains H^ as well as the class of all nx n skew-Hermitian 
matrices. Another simple, but interesting observation is 
that co-Hermitian matrices are always normal. 

Before coming to the proposed study of angularity analogue 
of Sylvester's law, we shall give some characterizations of 
co-Hermitian matrices. 

LEMMA 2.4.2. A normal matrix K s M^j is co-Heimitian iff 
all the characteristic roots of K lie on an extended ray of the 
complex plane. 

Proof. This is an immediate consequence of the well-known 
result about Hermitian matrices [l6, p.64], that a normal matrix 

A is Hermitian iff the characteristic roots of A are all real. 

AAA 
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LEMMA 2.4.3. K is co -Hermit ian iff K* = ocK for some 
complex ntJmber a such that la 1=1. 

Proof. This result is given in Ballantine [28l]. If 
K* = aK, then by expressing a as exp(ip) , we find that eXp(i^/2)K 
is Hermit ian. Hence K is co-Hermitian. 

On the other hand, if K = zH for z (^O) s C and H e H^^ 
we have K* = (z/z)K. Hence K* = aK with la 1=1. 

In Theorem 2.4.5 we characterized the class of all 
nonsingular matrices C such that C*AC is norroal for every 
normal matrix A. Now in the following lemma, we shall 
characterize the class of all matrices A such that C*AC is 
normal for every nonsingular matrix C. The answer, turns out 
to be the class of all co-Hermitian matrices. 

LEMMA 2.4.4. Let A s Then C*AC is normal for every 

nonsingular C iff A is co-Hermitian. 

Proof. Suppose A s Eh. Then A = zH for z (^0) e C 

and H c H^, By st:^ght forward calculations it follows that 

[C*AC,C*A*C] = 0, completing the sufficiency part of the lemma. 

Next, suppose that C*AC is normal for every nonsingular C. 
Therefore, we immediately infer that A is normal. Let 
U*AU = D, where U is unitary and D = diag(\^, . . . , \^) . For 
the choice C = U(I+J) where I is the identity matrix and J is 
the matrix of unities, i.e., the matrix having all its elements 
as 1, the normality of C*AC leads to the relation 

D(I+J)^D* = D*(I+J)^D 
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which in view of the fact = nJ further simplifies to 
DJD = D JD. From this we get 

^ for i,o=l, . , . s,n. 

10 10 

and consequently for i=l,,..,n, a being a constant 

such that ia|=l. Hence we have D* = aD and thereforej, A* = aA 
which completes the proof of the lemma by virtue of the 
preceding lemma. AAA 

We shall now study the angularity analogue of Sylvester* s 
law. The first assertion of the following result can be 
considered as the angularity analogue of Sylvester’s inertia 
theorem . 

THEOREM 2.4,8, The following three statements are true 
and are equivalent to each others 

(i) If C is nonsingular and K e K^, then 0[C*KC] = 6[k]. 

(ii) If P > 0 and K c then e[PK] = ©[K]. 

(iii) If K s K and RK > 0, then ©[R] = ©[K*]. 

n 

Prop f , (i) is a consequence of Corollary 2.4.3. To 

prove (i) ==> (ii) , as P > 0, it has a positive definite 
Hermitian square root p^/^. Thus ©[PK] = ©[P^/^P^/^K] = 

©[ = ©[k]. In view of the positive definiteness 

of CC*, (ii) ==> (i). 

As HK > 0, K is nonsingular and hence ©[R] = ©[RKK""^] = 
0[K“^] = 0[K*] showing that (ii) ==> (iii) . The converse 
follows in a straightforward manner if one carries out the 
steps as in the proof of Corollary 3 of Ostrowski and Schneider 
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[7l]. However, for the sake of completeness we shall give the 
proof. If K is nonsingular, then P = PKK~^ > 0 and hence by 
(iii) ©[PK] = 0[K~*] = ©[K] Implying (ii) in this case. Here 
and in the sequel the notation K is used to denote (K ) 
which is also (KT^)*. 


TT 

If K is singular, we choose a unitary U so that U IfU = D = 
r D*| 0-1 

Lq O-J partition form where is a nonsingular diagonal 
matrix, noting that K is normal. Partition Q = U*PU as 


Si S2 
Si S2 


so that QD = U*(PK)U = 


QllD^ 0 

®2lS ° 


We have to 


prove that ©[PK] = ©[K] i.e., ©[QD] = ©[d]. To establish this 


.-1 


it is enough to show that ~ Since '“S’ 

and ^^ 22 ^ 2) ^2 ~ ^11 ^ (iii) ®t^21^1^ ~ ^ ~ ®L^ 2 ^* 

The proof of the theorem is therefore complete. 


Recently, it has been reported by Cain [36] that D.G.Hook, 
in an unpublished thesis written under C.S.Ballantine, has 
given a nice characterization of matrices of Sylvester type, 
i.e. matrices A satisfying In(C*AC) = In(A) for every 
invertible C, The result is as follows! 

THEOREM 2.4.9 [36]. In(cV) = In(A) for every invertible 
C iff the numerical range (or the field of values) of A, 

W(A) = {x*Ax! x*x = 1 } 

lies either (i) in a straight line through 0 or (ii) in the 
open right half plane together with 0 or (iii) in the open 
left half plane together with 0. 
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Motivated by this result, we characterize below, the 
class of all matrices A satisfying 9[ C*AC] = 9 [a] for every 
invertible C. Once again, as in Lemma 2.4.4, the answer 
turns out to be the class 

THEOREM 2.4.10. Let A s Then e[C*AC] = 9 [a] for 

every* nonsingular C iff A e K^. 

Proof. Theorem 2.4.8 (i) furnishes the proof of "if” 
part. Conversely, let 9[C*AC] = 9 [a] for every nonsingular C. 
From this, it is clear that In(C*e^AC) = In(e^“A) for every 
nonsingular C where a is an arbitrary real number. By the 
above theorem, we see that for each real a, W(e^‘^A) , i.e., 
e^W(A) lies in any one of the three sets mentioned in the 
statement of that theorem. 


Our claim is that for every a, the first possibility holds. 
If it is not so, then for a = ao there will exist 

Zf (= r2exp(i02^)) and Z2 (= r2exp(i02)) with rp, r2, Re(z]_), 
Re(z2) > 0 and 0 < < m such that z^^, Z2 c expCia^) W(A) , 

Cons equently , for 





and 


p = + {‘^“(02^+02^/^* 

Z2 « e^^W(A) where 

z^ = r^exp{ (m+0^-02) V2} 

^2 ~ r2exp{ ('Ji+02'^l) 4 / 2 } . ► 


It can be easily verified that 

Re(z2) Re(z2) < 0. 
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Moreover arg(z^) -•arg(z 2 ) is not a multiple of 7i. Hence for 
a = p, e^'^V/CA) does not satisfy any one of the three conditions 
mentioned earlier. It is a contradiction. Hence for every a, 
e^W(A) lies in a st:^ight line through 0, In particular* W(A) 
lies in an extended ray which in turn implies that for some 
real 0, x'*'(e"”^^A)x is real for all x s C such that x*x = 1. 
Consequently, we have that e ^^A = e^^A*. Hence by Lemma 2,4.3* 
it follows that A s K^. 

The argument just used is a consequence of the fact that 
if x^'Ax is real for all x « then A is Hermitian. We may 
also note that A is Hermitian iff W(A) is an interval on the 
real line [l6, p.l69]. Based on this we can conclude that A 
is co-Hermitian iff W(a) is an interval on an extended ray. 

We may summarize the conclusions reached so far about 
co-Hermitian matrices in the form of a theorem. 

THEOREM 2.4.11. Let A s M^^. Then the following 
statements are equivalent. 

(i) A c K^. 

(ii) A is normal and has all its eigenvalues on an 
extended ray. 

(iii) A* = aA for some complex number a such that |al=l. 

(iv) C*AC is noiroal for every nonsingular matrix C. 

(v) e[C*AC] = e[A] for every nonsingular matrix C. 

(vi) W(A) is an interval on an extended ray. 
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Since ©[a] = ©[b] implies that In(A) = In(B) , the 
angularity results proved in this section, namely Theorem 2.4.1, 
its four corollaries, Theorems 2.4.7 and 2.4,8 may be restated 
as inertia theorems by replacing "0" by "In". For instance, 
one such a result is 

THEOREM 2.4.12. If A and C*AC are normal and C is 
nonsingular then In(C*AC) = In(A) . 

This may be regaitled as Sylvester’s inertia theorem for 
normal matrices [36]. We sketch an alternative proof of this 
theorem below, by using the classical inertia theorem due to 
Sylvester. It is well known that if A is normal then 
In(A+A*) = In(A) . Applying this to C*AC, In(C*AC) = 
In(C*AC+C*A*C) = In(C*(A+A*) C) . Since C is nonsingular and 
A+A* is Hermitian, we have In(C*(A+A*) C) = In(A+A*) = In(A) 
completing the proof, 

A quantitative sharpening of this result will be given 
later in Section 2.6 where we discuss some more generalizations 
of Sylvester's law based on the work of Ostrowski [69,70] and 
Thompson [85,86]. 

The following theorem is the inertia analogue of 
Theorem 2,4.2, 

THEOREM 2.4.13. Two given normal matrices A and B 
have the same inertia, iff there exists a nonsingular matrix 
C such that C*Re(A)C = Re(B) . 
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Pr^of. Again, we employ the result that In(Re(A)) = 

In(A) when A is normal* By means of this result and 
Sylvester’s theorem the sufficiency part follows immediately. 

To prove the other part, we note that In(A) = In(B) 
implies In(Re(A)) = In(Re(B)) and hence also ©[Re(A)] = ©[Re(B)], 
since for Hermitian matrices equi-inertia property implies 
equiangidarity . Therefore by Theorem 2.2.1 there exists a 
nonsingular C as required. AJkA 

¥e have seen that the term "inertia" can be simply 
replaced by "angularity" in Sylvester’s theorem. However 
this is not possible with all inertia theorems. For example, 
if we let A = [^ "^3 and H = then Re(AH) > 0, 

but ©[a] 0[H] . This shows that "inertia" cannot be replaced 

by "angularity" in the main inertia theorem (refer Theorem 1.2.3). 

2.5. Angularity Theorems in Singular Case 

Referring to the results established in the preceding 
section, we now pass on to a discussion of the case where C 
is permitted to be singular and accordingly the concerned 
positive definite matrices are ass-umed to be positive 
semidefinite. Also, we extend our angularity results for 
the case when C is a rectangular matrix of order nxm. 

The first result in this sequence is the one corresponding 
to the basic lemma of the last section. 
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LEMMA 2.5.1. If B*AB and. C*AC are diagonal matrices, 
and B is nonsingular, then TtCC^AC) < 7i:(B*AB) . 

Proof, With reference to the notation and convention 
used in the proof of Lemma 2.4,1, first we shall prove that 
the first q columns of C are linearly independent 

even if C is singular. To prove this, let us assume that 

^k'^k ° (2.5.1) 

where C 2 ^,,..,c^ are scalars. We claim that Cj^=0 for 
k=l,...,q. From (2.5.1), we have 

^k=l^kV ^^kSlVp " (2.5.2) 

It IS not difficult to see that v^Av^ is the (i,o) element 
of C*AC. Since C*AC is diagonal, (2,5.2) reduces to 

(2.5.3) 

k=l ^ ^ 

and hence 

q p 

2 Icvl Re(\) = 0. (2.5.4) 

k=l ^ ^ 

Since Re(i^j^) > 0 for k=l,...,q, it follows that Cj^=0 for 
k=l,..,,q. We may now observe that the first half of the 
analysis of the proof of Lemma 2.4.1 constitutes the proof 
of this lemma. AAA 

By an arbitrary open half plane we mean a set of the 
i© 

form {e 3R^i a < 6 < a+rc}, a being an arbitrary real number. 

Assuming that B*AB and C*AC are diagonal and B is 
nonsingular, let us apply the above lemma to the diagonal 
matrices B*e^AB and C*e^^AC where a is real. 


As a result. 
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we infer that in any arbitrary open half plane, the number 
of eigenvalues of C*AC does not exceed the number of eigenvalues 
of B AB in that half plane. This may be expressed in the 
language of angularity as follows s 

THEOREiyi 2.5.1. If B*AB and C*AC are diagonal and B is 
nonsingular, then 


Z 

(jO 



S being any open half plane. 


(2.5.5) 


Corollaries 2. 4. 1-2. 4. 4 have similar analogues. 


An interesting feature of our principal result, namely 
Theorem 2.4.1 of the preceding section lies in the following 
fact. In any arbitrary open half plane if two matrices have 
equal number of eigenvalues then on any arbitrary proper ray 
also they have equal number of eigenvalues. Based on this, 
it is now tempting to conclude under the assumptions of -Theorem 
2.5.1 that 

9[C*AC] < 0[B*AB] for all to s , 

0)— 0) p*^ 

that is, 

0[C*AC] < e[B*AB] (2.5.6) 

in view of Defini-tion 2.2.4, Unfortunately it is not true. 

To substantiate this, a counterexample is given below. Let 

A = [1 qb, B = 1], c = [J °] 

SO that 

C*AC = diag(l,0) and B*AB = diag(2+2i,2-2i) . 



Evidently, for 01= ]R_^, e[C*AC]^ = 1 and e[B*AB](^ = 0 
showing that (2.5.6) does not hold. 
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Therefore, when C is permitted to he singular 
cannot be replaced by in Theorem 2.4.1 and its four 
corollaries. However such a replacement is valid in the case 
of angularity analogue of Sylvester' s theorem when C is permitted 
to be singular. To illustrate, we state the counterparts 
of Theorem 2.4.8(i) and (ii) in the singular and semidefinite 
case ass 

THEOREM 2,5.2. The following two statements are true 
and are equivalent to each other. 

(i) If C e and K s K^, then e[C*KC] < ©[k]. 

(ii) If P > 0 and K c K^, then 0[PK] < ©[k]. 

Proof, Since K and C*KC are normal, by the earlier 
theorem we find that in any arbitrary open half plane, the 

'K’ 

number of eigenvalues of C KC does not exceed the number of 
eigenvalues of K in that half plane. Moreover both K and 
C*KC have their eigenvalues on one and the same extended ray. 
Hence, it is clear that 6[ C*KC] < ©[k] for all tu s , 

CO Cu P 

This completes the proof of assertion (i) 

(i) <==> (ii) follows easily as in Theorem 2.4,8. AJkA 

Corresponding to the third statement in Theorem 2,4.8, 
we have the following analogue which requires the assumption 
that K be nonsingular and it follows from Theorem 2,5.2(ii). 
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THEOREM 2 . 5 . 3 . If K e is nonsingular and PK 2 0 
then e[P] < e[K*]„ 

In view of (2,2,1) and (2.2.2) the results of this section 
easily lead to the corresponding inertia theorems given below. 

THEOREM 2.5.4. The following two statements are true 
and are equivalent. 

(i) If C e M^ and K s then In(C*KC) < In(K) . 

(ii) If P > 0 and K e then In(PK) < In(K) . 

In this theorem, in particular, if K is considered as 
Hermitian then (i) and (ii) are respectively Theorem 2 in 
Ostrowski [ 69 ] and a special case of Corollary 4 in Ostrowski 
and Schneider [ 71 ]. 

Since In(K^) = In(K) , the inertia analogue to Theorem 
2 . 5.3 becomes 

THEOREM 2 . 5 , 5 . If K <s is nonsingular and PK 2 
then In(P) < In(K) . 

This may be compared with a particular case of a part 
of Lemma 2 in Carlson and Schneider [4?] . 

So far we have confined ourselves to the simple 
situation in which C is a square matrix. Now, we will drop 
this assumption and assume instead that C is an nxm matrix 
with m >, =, or, < n. Of course, the other matrices 
concerned are in M^. It can be easily seen that all the 
results involving C proved in this section remain valid. 
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Once we prove that Lemma 2,5.1 holds in rectangular case, 
then other results follow automatically since the same 
arguments can be applied as in the square case. We will 
therefore have to prove 

THEOREM 2.5.6. Suppose C e A and B c and B is 

nonsingular. If B AB and C AC are diagonal matrices then 
7t(C*AC) < tt;(B*AB) . 

Proof. The proof of Lemma 2.5,1 remains valid without 
change for this case also. However, by assuming that the 
result is valid for square case, we complete the proof of 
this theorem by making use of a simple idea employed by 
Oo while extending the quantitative formulation 
of Sylvester's law of inertia to the rectangular case. 

We have to consider three cases. The case m=n is 
simply Lemma 2.5.1. Now we consider the case m < n. Let 
us form C s by augmenting (n-m) columns, consisting of 
zeros to C so that C = [C O] . Since C*AC = q], it 

follows that itCC^AC) ~ vi(C AC) < 7i(B*AB) . 

Assuming m > n, now let us form the m xm matrices 
A = [q q], S = Cq I^^ ~ ^0^ '^here the block matrices are 

rsj 

of appropriate dimensions. Since C AC = C AC, 

'H* ^ 

B AB = q] and B is nonsingular, it follows that 

7t(C*AC) = %(c*aS) < %(b*AB) = 7i:(B*AB). AAA 

The inequality just established concerning tc(C*AC) and 
tiCB^AB) can further be generalized involving the rank of C. 
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For this we require the following lemma. 

LEMMA 2,5.2, Let K be a principal submatrix of order r 
of H e Then 

max( 0 , 7i(H)-n+r) < Tt(K) < min(r, uCH)), (2.5.7) 

This is an immediate consequence of the 
well-known interlacing Cauchy's inequalities [l 6 , p.ll9] for 
the eigenvalues of principal submatrices of Hermitian matrices 
If > . . . > are the eigenvalues of H and — •*•2 Pn-1 
are the eigenvalues of a principal submatrix G or order n-1 


of H, then 


so that 


2 ^ jL 2 * ^~1 * 9 n— 1 


tc(H )-1 < 7i(G) < %(E) . 

By a repeated application of ( 2 , 5 . 9 ) we havei 

'Jt(H)-n+r < 7 i;(K) _< 'n:(H) . 


Furthermore , 


0 < 7 r(K) < r. 


( 2 . 5 . 8 ) 


(2.5.9) 


(2.5.10) 


(2.5.11) 


From ( 2 , 5 , 10 ) and (2.5.11) the required result follows. 


THEOREM 2 . 5 . 7 . Let C be a matrix of order nxm and 
of rank r. If B*AB and C*AC are diagonal and B is invertible 
then 

max(0, TT (B*AB) -n+r) <. 7i;(C*AC) < min(r, 'n:(B*AB)) (2.5.12) 

Proof. Since C is of rank r, there exist nonsingular 
matrices X and Y (of order n, m respectively) such that 
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y 


where the block null matrices are of appropriate order 
[l8, p.177]. Let H = A+A* and H = X*HX. If H is written in 
the partitioned form (H. .) . . where , s M , Hpp <s M , 
then by usual simplifications 

n(C*HC) =Tt(% 3 _). (2.5,13) 

By the above lemma, 

max(0, T[;(H)-n+r) £ £ niin(r, 7t(H)) (2.5.14) 

and by Sylvester's law of inertia 

tc(H) = 7i(H) =‘rt(B*HB). (2.5.15) 

Since B*AB and C*AC are diagonal, 

7r(B*HB) = 7x(B*AB) and ti;(C'"HC) = tc(C*AC) . (2.5,16) 

Prom (2.5.13) -(2.5.16) , the required result follows. A4A 


COROLLARY 2,5.1, Let C be a matrix of order nxm 
with full row rank. If B*AB and C*AC are diagonal and B is 
invertible then Tt(C*AC) = ■ji(B*AB) . 

Proof. The proof is almost obvious. Since r=n, 
from (2.5.12), we have 

7i(B*AB) < 7i(C*AC) < 7i:(B*AB) 

and the corollary follows. ^ 

REt'IARK 2,5.1. Theorems 2.5.6 and 2.5.7 and Corollary 
2.5.1 are valid if the matrices concerned are normal 
instead of diagonal. 
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REMARK 2.5.2. Inequalities analogous to (2.5.12) can 
be easily obtained for the number of eigenvalues in any open 
half plane by considering Ae^‘^ in the place of A. 

RMARK 2.5.3. Applying Cauchy's inequalities mentioned 
earlier, a proof of mCS^HS) £ 7 i:(H) for a Hermitian H and a 
rectangular matrix S is sketched by Hill in [60]. 

2,6. Generalization of Sylvester’s Law of Inertia 

Recently, Thompson [86] obtained the following elegant 
result consisting of four inequalities which involve the 
eigenvalues of two Hermitian matrices A and C*AC and the 
singular values of the rectangular matrix C, i.e., the 

y *]| / ^ 

eigenvalues of the positive semidefinite matrix (C C) ' . 

THEOREM 2.6.1 (Thompson [86]). Let 

be the eigenvalues of A s H^, ®1 ... 2 singular 

values of C" m ^1 — • • • 2 Pm "the eigenvalues of C AC. 

Ifl<.i^m, li.il,n, i+i~l < m, then 

p. . _ < s?a . when a , >0, (2.6,l) 

1 + 3-1 ""13 3 ~~ 

P-.- T •« * when a , < 0. (2.6.2) 

'^1+3-1 - m+l-i 3 0 

Ifl^i^m, li.3i,n, i+3 > n, then 

8 . > s^a . when a . 2 (2.6,3) 

i+3-n “13 0 

P > s2 a when a .< 0, (2,6.4) 

i+3-n “ m+l-i 3 3 

These inequalities have a surprising amount of content, 
since they contain as special cases (i) Sylvester's law of 
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inertia (ii) its sharpened forms due to Ostrowski [69>70] and 
(iii) Cauchy's interlacing inequalities. 


Considering m=n and C nonsingular let us put i=l in 
(2.6.1) -(2.6.2) and i=n in (2.6.3) -(2,6.4) . Then we arrive 
at the inequalities 

®n°^j ~ ^0 — ^1*^0 when 2 0 (2.6,5) 

and 

s^ttj <, P j <. when £ 0 (2,6,6) 

or., equivalently 


where 


, 1 < d < n (2.6.7) 

<. 0. i. 1 < 0 < n . (2.6.8) 

n j J- 


In literature, the last mentioned result is known as 


Ostrowski's quantitative formulation of Sylvester’s law of 
inertia. An immediate inference from this result is that 
if A e and C is nonsingular then A and C*AC have the 
same number of positive, negative, and vanishing eigenvalues, 
which, of course, is Sylvester’s law of inertia. 


Ostrowski’s extension [70] of (2,6,5) -(2.6.6) for 


the case when C is nxm matrix is equivalent to 

p . < sfa . if > 0 

and 


6 > s^a 

^m+1-0 1 n+1-0 


if p 


m+1-0 


< 0 


(2.6.9) 

( 2 . 6 . 10 ) 


and these two inequalities are also deducible [86] from 
(2.6.l)-(2.6.4) , 
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I 

Finally, asstuning 'm < n and taking C = [ , we see that 

the singular values of C are s^=...=Sj^=l and C*AC is the 
leading principal submatrix of order m of A. With i=l, 
from (2.6.l)-.(2.6,2) we have 

p. £ a. for j=l,...,m. (2.6.11) 

U 

Similarly by taking i=m and substituting J+n-m for j in 
(2.6.3) -(2.6.4) it follows that 

P . 2 a 0=1 » . . . ,m. (2.6.12) 

0 "" 3+n-m 

Combination of (2,6,11) and (2.6,12) constitute the so-called 
Cauchy's interlacing inequalities. 

All these deductions show the importance of Thompson’ s 
inequalities (2. 6, l) -(2.6,4) . In what follows we shall give 
generalization of these inequalities treating A and C*AC as 
normal . 

THEOREM 2,6.2. Let A be a normal matrix with eigenvalues 
{a^} such that Re(a^) > ... > Re(a^) and C be an nxm matrix with 
singular values s^ > ... > Sjjj. Suppose C*AC is normal having 
the eigenvalues {P^} such that Re(P-^) 2 ••• 2 

If 1 £ i <. m, 1 £ d £ i+d-1 £ ni» then 

Re(Pi^j__l) £ s^Re(aj) when Re(a^) > 0, (2,6.13) 

Re(p . ) < s^ ^ .Re (a .) when Re(a .) < 0. (2.6.14) 

i+d-1 ~ m+l-i 0 0 

If 1 1 i 1 1 £ d £ n, i+d > n, then 

^®^^i+d-n^ ^ s?Re(aj) when Re(a^) > 0, 

r s£i.iRe(ajj)when Re(aj) < 0. 


(2.6.15) 

( 2 . 6 . 16 ) 
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• It is very simple. Since A is normal the 
eigenvalues of Re(A) i.e., (A+A*)/2 are Re(a .) , j=l,...,n. 

u 

Similarly the eigenvalues of C*ReiA)C, i.e., Re(C*AC) are 
Re(Pj) , j=ls...sm. Applying Thompson's theorem with Re(A) 
instead of A yields (2.6.13) -(2,6.16) . AAA 

As an immediate consequence of this result we have the 
following corollary which may be regarded as a sharpened 
form of Sylvester’s law for normal matrices (Theorem 2.4.12). 

COROLLARY 2.6.1. Let A and C*AC be nx n normal matrices 

with 0 nonsingular. If the eigenvalues of A, C*AC and the 

l/2 

positive definite matrix (C*C) are respectively 

and {sj} with Re(a 2 ) > ... 2 Re(a^) , Re(P 2 ^) 2 ••• 2 Re(l^) 

and s^ 2 • ♦ • 2 then 

Re(p^) = 0jRe(ap, 1 < d < n (2.6.17) 

where 

£ 0 • 1. s^pl 2 d 1 n. (2,6.18) 

n 1 1 

The above theorem and its corollary have similar analogues 
for the imaginary parts of the eigenvalues of A and C AC. 

For this we have to consider Im(A) , i.e., (A-A*)/2i, 

It may be noted that when the real parts of the eigenvalues 
are ordered decreasingly, then the corresponding imaginary 

IE 

parts need not be in the decreasing order. The ordering 
is considered separately for the real and imaginary parts. 
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We remark that if A is nonsingular and 6[A] = 6[u], then 
A need not be normal in general. To show this we take 
■^“^10^ that P = [q and ^ = [ 20 ^. It is easily 
verifiable that ©[A] = ©[U] . However A is not normal. 

Next, let us consider the nonnormal matrix A = [^ . 

For this, the polar factors are P = [f and U = [J 9], 

Since the eigenvalues of A are (2+iiv^)/2, obviously 
9 [a] ^ ©[U]. 

The above two examples indicate that when A is nonsingular 
and nonnormal, A and its polar unitary factor may or may not be 
equiangular. 

We have seen in Section 2.4 that when P > 0 and K is 
co-Hermitian then ©[PK] = ©[K] . Now it is interesting to 
inquire whether or not this result holds when K is normal in 
general. It is readily seen that the answer is in the 
negative, by taking P and K respectively as P and U of the 
second example given above. However, if PK is also assumed 
to be normal then it can be established that @[PK] = ©[K], 

Before coming to this result, we shall say something 
about the product of two normal matrices. It is well known 
that if two normal matrices commute then their product is 
also a normal matrix [30?]. But its converse is not true, 
since for A = [«,i and B = Ci 0 ^, A, B and AB are normal 
whereas AB ^ BA, 
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Applying the result of Wiegmann [307] that if A and B 
are normal then AB (and hence BA) is normal iff each one of 
A and B commutes with the Hermitian polar factor of the other, 
Cullen [286] has provided a sufficient condition on a normal 
matrix A so that for normal B, we have AB normal iff A and B 
commute . 

According to Cullen [286], A c is said to have 
modularly distinct eigenvalues if unequal eigenvalues of A 
have unequal moduli. 

THEOREM 2.7.2 (Cullen [286]) . If A and B are normal 
and one of the two has modularly distinct eigenvalues, then 
AB is normal iff A and B commute. 

One part of the theorem is obvious and for the other 
part we shall give a direct proof without applying Wiegmann ’s 
result stated earlier. Our proof depends on the following 
easy lemmas. 

LEMMA 2.7.1. Let (C.J. . o Be the partition form 
of a normal matrix C where C^^^ and C ^p are square matrices. 
Then llC 23 _l| » 11^12 H ^Bere l|x|| denotes the Frobenius norm 
{tr(X*X)}l/2, 

Proof. Since C*C = CC*, we have 

^I'^ll *^21^21 = *^1^1 ^2^2’ 

Taking "tra.C8 for boish. sides ^ “the resulf follows* AAA 
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COROLLARY 2.7.1, 


Let 



0 

C 22 


where 


hi °22 


^21 


0 and 


are square matrices. 
*^11*' ^22 normal. 


Then C is normal iff 


PTOOf . 
lemma. 


It is an immediate consequence of the above 

AAA 


LEMMA 2.7.2. Let D = diag(d^, . . . ,dj^) have modularly 
distinct eigenvalues with Id^^j < ... £ and let C be a 

normal matrix. Then if either CD or DC is normal then 
C and D commute. 

Proof, Let us assume that DC is normal with C = (c^^.). 
There may be two cases. First let us consider the case 
jd^l < 1 ^ 2 ! . Partitioning C and DC with 1x1 block in 
the leading position for both matrices, an application of the 
preceding lemma gives 


n ? ^ ,2 

( 2 . 7 . 1 ) 

and 


n , i2 ^ 1 |2 

^^^2 “ j=2 

(2.7.2) 

From these two relations, it follows that 


{(|dj2-|cL|2)|c Ih = 0. 

1=2 

(2.7.3) 

By the assumption that jd^l ^ jd 2 l, we have Id^l 

< jdj^l for 


i=2,,,,,n and hence it follows that c ^-[ -0 for j-2,,.,,n. 
Hence proving DC = CD reduces to a similar problem in 


dimension n-1 
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Next, we shall consider the possibility that 

I dll = Id^l = ... = Idj^l < (2.7.4) 

where m ^ n. Of course, when m=n, there is no inequality 

term in (2.7.4), Since D has modularly distinct eigenvalues, 

(2.7.4) implies that d]_=d 2 = . . . =djjj=d , say. Let us write 

D = diag(dl , D) and C = (C.^) . ^ in partitioned forms 

where C_- c M . Now, 

11 m 


DC = 


DCgji DCgg 


Since C and DC are normal, evidently 

IICjill = IlCiall ^2.7.5) 

and 

I^Cgill = l|dCj^2ll* (2.7.6) 

Hence 

ipCg^ll = fdlllCgJI (2.7.7) 

which may be expressed in the form 

z z l(|dj2-|a|^)|0iJh = 0. (2.7.8) 

j=l i=m+l ^ 

By the hypothesis, |dj|^| > )dl for i=m+l,...,n. Thus 

(2.7.8) yields that C 22 = 0 and hence we have C 22 ~ 

Again we see that the problem of showing CD = DC reduces 
to a similar problem of lower dimension and in this case 


it is of dimension n-m. 

Since n is finite and the lemma is true for the set of 
1x1 matrices, the result follows. In the same manner one 



102 

can proceed to prove the result if CD is assumed to he 
normal. aaa 

Now We shall complete the proof of Theorem 2,7.2, Let 
us assume that A has modularly distinct eigenvalues. Then 
there exisits a unitary matrix V such that V*AY = D = 
diagCd^, . . . jdj^) with entries as described in the above lemma. 
Denoting V*BV by C, AB becomes VDCrv*. Now C and DC are 
normal. Applying the preceding lemma, it follows that 
CD = DC and therefore, AB = BA. The same argument remains 
valid if we interchange the role of A and B. The proof of 
Theorem 2.7.2 is now complete. AAA. 

Since positive semidefinite matrices always have 
modularly distinct eigenvalues, as an application of 
Theorem 2,7.2, we have the following theorem. 

THEOREM 2,7,3. Let P 2 ^ normal such that 

PK is also normal. Then 

(i) ©[PK] = e[K] if P > 0 and 
(ii) ©[PK] < ©[K] if P > 0. 

Proof. By Cullen’s theorem PK = KP and hence the 
argument given in Theorem 2.7.1 is applicable and the theorem 
follows at once. 

We remark that this theorem seems to be a generalization 


of Theorem 2,7.1. 
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2,8. Totally Normal Matrices 

In this Section and the following one, we shall deal with 
normal matrices having some normal principal submatrices. 

In analogy with positive definite matrices, i.e., matrices 
all whose leading principal submatrices are again positive 
definite (in fact, all whose principal submatrices are positive 
definite) and with totally nonnegative (positive) matrices, 
i.e., matrices whose minors of all orders are nonnegative 
(positive) [8 , p.lis], we define totally normal matrices as 
given below. The notion would allow us to make a 
generalization of Cauchy’s interlacing inequalities. 

DEFINITION 2,8.1. A e is said to be totally normal 
if every principal submatrix of A is noimal. 

It is clear that totally normal matrices are normal and 
not conversely. Obvious examples of totally normal matrices 
are diagonal matrices and co-Hermitian matrices. 

Listed below are some observations about totally normal 
matrices . 

(1) If A s M^ and B « totally normal matrices, 

then their direct sum is also totally normal. 

(2) We know that if A c and B s Mjj^ are normal, then 
their Krone cker product A@B is normal [l6, p.70]. However, 
similar result is not true for totally normal matrices. 

For example, A = [_^ and B [| |] are totally normal, 
but A0B is not totally normal. 
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(3) Another well-known fact about normal matrices is 
that the sum as well as the product of two commuting normial 
matrices are again normal . This type of result is not true 
‘for totally normal matrices. To see this, consider the 
totally normal matrices 



“1-1 1“ 


“2 1 1“ 

A = 

11-1 

and B = 

12 1 


-111 


12 2 


In view of the circulant nature of A and B here, they commute. 
Simple calculations reveal that neither A+B nor AB is totally 
normal . 

The construction of counterexamples of this type provides 
a justification for the study of certain special classes of 
matrices (e,g. circulants) . 

(4) If A is totally normal then A-zI is also totally normal 
where z is any complex number. Hence, by virtue of Lemma 
2.4,2, we conclude that a normal matrix having all its 
eigenvalues along a straight line is totally normal. 

(5) If A is totally normal and U is unitary, then U*AU 

need not be totally normal in general. However, if U happens 

to be a permutation matrix then U*AU is always totally normal. 

(6) If A is totally normal and of rank r, then there 
exists a permutation matrix P such that P*AP has a nonsingular 
normal matrix of order r in the top left corner. To prove 
this it suffices to show that there exists a nonsingular 
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principal submatrix of order r of A, Suppose if all the 
r-th order principal submatrices are singular, then the 
characteristic polynomial reduces to 

+ ... + 

where Sj^ denotes the sum of all principal minors of order k 
of A [l3, p.5^]. This shows that A has at least (n-r+l) 
vanishing eigenvalues which is a contradiction since any normal 
matrix of rank r has precisely (n-r) zero eigenvalues. 


In what follows first we shall give a characterization 
of totally normal matrices. As a prelude, we study the 
2x2 and 3x3 situations and subsequently we take up the 
general case. 


LEMMA 2.8.1. 


^ “ ^®rsb,s=l,2 

K2I ' 


is totally normal iff 

(2.8.1) 


and 

^ 21^12 = ^ 12^2 

where 

^12 "" ^11 " ^ 22 * 


(2.8.2) 

(2.8.3) 


Proof. Note that a 2x2 matrix is totally normal iff 
it is normal. Now, by equating the corresponding elements 
in AA* and A* A the result follows. ^ 

Here is another form of this lemma: 


12 


i^^expd^) 

(2.8.4) 

.^2e3cp(i0) 

(2.8.5) 


and 
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for some real 0 with defined as in (2.8.3). 

In the following lemma, n=3. However, for future 
reference, the lemma is framed in terms of n. 


LEMMA 2.8.2. A = (aj,s) ^ (with n=3) is totally 
normal iff for every r and s satisfying 1 £ r < s £ n, 

I 1 ~ ^ ®^sr ^ ’ 


^sr^rs “ ^rs'^rs 


( 2 . 8 . 6 ) 

(2.8.7) 


and 


^rk^sk “ ^kr^ks ^ {l,2, . , . ,n}\^{r,s} (2.8.8) 


where 


‘^rs ^rr“ ^s s * 


(2.8.9) 


Proof, In view of the lemma proved just now, it is 
evident that the conditions given in (2.8.6) and (2.8.7) 
are necessary and sufficient to ensure the normality of all 
the three second order principal suhmatrices of A. 

Now by equating the entries in (l,2) position of AA* 
and A*A we have 

3 __ 3 _ 

kSi ^lk^2k kSi ^kl^k2* (2.8.10) 

By subtracting the equation (2.8.7) with r=l, s=2 from 
(2.8.10) , we find 

a-j ^ ^23 * ^3 1 ^ 3 2 * ( 2 . 8 . 11 ) 

Similarly by considering the elements in (1,3) and (2,3) 
positions in the products AA* and A*A and by making use of 

( 2 . 8 . 7 ) with suitable r and s, the other two conditions in 

( 2 . 8 . 8 ) follow. This completes the proof. AAA 
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We now tiirn to the characterization of nxn (n > 3) 
totally normal matrices. 

THEOREM 2.8.1. Ac M^^, n 2 3 is totally normal iff 
all its third order principal submatrices are totally normal,, 
or equivalently, iff all its second and third order principal 
submatrices are normal. 

Proof. One part is obvious. To prove the other part 
let us assimie that all the third order principal submatrices 
of A are totally normal. By applying the preceding lemma 
to each one of the third order principal submatrices, we see 
that for every r and s satisfying 1 £ r < s £ n, (2.8.6) -(2,8 .9) 
hold. 

Following Marcus and Mine [l6. Chapter I, Section 2], 
let 0^ 1 £ m £ n denote the set of all strictly increasing 

sequences of m integers chosen from l,2,,..,n. If 
a = (i^, . . . ,ijjj) « designate the principal submatrix 

of A lying in rows a and columns a with the notation A[aja], 

We are throu^ if we show that A[ala] is normal for every 
a «= ni=l,...,n. We shall, therefore, consider one 

such A[aja] for an arbitrary m c {l,.,.,n} and denote it by B. 

If B = (b^g) (r,s=l, ,. ,,m) then b^g = ^iji^is* 
using the formula BB* = B*B for B to be normal and taking 
advantage of the Hermitian nature of these products, we see 
the necessary and sufficient conditions for B to be normal as 

k£l ^rk^sk ® k£l ^kr^ks» s £ m. (2*8.12) 
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The above relation for the case r=s reduces to 

m p 

kii (l^skl “ll^ksl ) =0* (2.8.13) 

In case r;^s , (2.8.12) can be rewritten as 

m _ _ _ ^ _ 

(^rk^sk“^kr^ks) ^sr^^rr'^ss^ “ ^rs^^rr”^ss^ ~ 
k?r,s (2.8.14) 

We know that b„_ = a^ ^ . Moreover, i^ < i„ whenever r < s. 

x o O 

Hence the relations (2.8.13) and (2.8.14) are certainly valid 
in view of (2.8.6) -(2.8.9) . Thus, B is normal. This 
completes the proof of the theorem. AAA 

From the proof, we observe that an equivalent form 
of the above theorem is Lemma 2.8.2 with n 2 3. 

COROLLARY 2.8.1. For any totally normal matrix A = (a ) 
having no zero off-diagonal element, there exists a real 0 
such that 

^rs Ss3.exp(i0), 1 ^ r < s £ n (2.8.15) 

and 

drg = d^gexp(i0) , 1 < r < s < n (2.8.16) 

where dj^s is defined as in (2.8.9) . 

Proof. In view of (2.8,4)-(2,8.5) * the conditions 

r*Jr\*rsjirsjrsj 

(2.8.6) -(2.8.7) can be rephrased as 

a^pg = agj,exp(i0j,g) (2.8.17) 

and 

^rs ' ^rs®^^^^rs^ (2.8.18) 

for some real 0 ,l<r<s<n. By considering r < s < k, 

rs “ 
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let us substitute for and agj^ in (2.8.8) by using (2.8.17). 
This yields 


Sj^^exp ( i0^^) exp ( -10,,^) = 


rk' ks 


sk*' 


kr ks * 


(2.8.19) 


Since the off-diagonal elements are assumed to be nonzero, 
it follov/s that 0^^^ = 0^^^ for r < s < k, with the convention 
that for all r and s, 0 _< 0^^ <271. By a similar procedure, 
it can be easily proved that 0^^^ = 0^^^ for k < r < s. 

Hence we have 0^^ = 0 say, for all r and s satisfying 
1 i. ^ ^ s £ n. The proof of the corollary is now complete. 


REtiARK 2.8,1. Any matrix A = (a^g) satisfying (2,8.15) 
and (2,8,l6) is of course totally normal even if some 
off-diagonal elements are zero, because these conditions 
meet the sufficient requirements (2,8,6) -(2,8,9) for a 
matrix to be totally normal. 


If we assume that only all the leading principal 

submatrices of A are given to be normal, it can be easily 

seen by routine manipulations that (2,8,6) holds for all 

r and sj also for certain values of r, s, k, (2.8,8) holds. 

This motivates to know whether the normality of all leading 

principal submatrices will suffice to say that A is totally 

normal. It cannot be so, however, since in 

”-l 1 1“ 

1-11 
-1 -1 0 

all the three leading principal submatrices are normal but 
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the principal submatrix J] is not noimal. 

We conclude this section by stating Cauchy's interlacing 
inequalities for principal submatrices of totally normal 
matrices . 

THEOREM 2,8.2, Let B be any principal submatrix of 

order m of a totally normal matrix A e M , If the 

n 

eigenvalues of A and B are respectively {a.} and {p^} such 

0 u 

that Re(aj^) >; ... 2 and ReCp^^) 2 ••• 2 then 

Re(aj) 2 ReCPj) 2 j+n-m^ » j=l,...,m. (2.8.20) 

Proof. It follows by applying the same principle used 
in Theorem 2.6,2, AAA 

In fact, for the inequalities in (2,8,20) to hold, it is 
enough that A and B are normal. 

2,9. Angularity and Inertia of a Partitioned Normal Matrix 

We have already seen in the survey chapter that if 
^^ij^i 0=1 2 partitioned form of H c where is 

nonsingular, then 

In(H) = In(Hqq) + In(K 22 ) , 

K 22 being the Schur complement of in H given by 

^22 “ ^22 " ^ 2^1 A2 ' 

This interesting inertia result is due to Haynsworth [57]. 

In [ 1 , p. 97 ], it has been remarked by Barnett that no 
result of this nature is known for non-Hermitian matrices . 
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In this section^ as an application of our basic angularity 
theorem (Theorem 2,4,l) , we prove some results for angularity 
and inertia of partitioned normal matrices, when the block 
matrices fulfil certain conditions. 


To obtain our main result of this section, we require 
the following lemma. 


LEMMA 2.9.1. Let A be an nxn normal matrix partitioned 
in the form (A. .) . . „ where and k^ 2 . normal, 

A^t is nonsingular and 


^1^12 " •S.1'^21* 


(2.9.1) 


Then the Schur complement B22 ■%]_ in A defined by 


®22 “ ^^22 “ ^ 21 ^ 1^2 


is also normal. 


Proof. 

Since and are normal we have 



'' S . 1'^1 “ "^ 1 ^ 

( 2 . 9 . 2 ) 

and 

^2^22 ^ ^22^22. 

(2.9.3) 

In view of 

(2.9.l)-(2.9.3) , the noimality of A yields 



■'^ 12'^12 ^ -^ 21^21 

(2.9.4) 


A A* = A* A ^ 

(2.9.5) 


21 21 12 12 

and 

% 2''^22 “ '^ 1 -^ 22 * 

( 2 . 9 . 6 ) 

Also, from 

(2.9.1) and (2.9.2) we have 



^ 1^2 “ “^ 1^1 * 

(2.9.7) 
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Now, by making use of (2.9.3), (2.9.4), (2.9.6) and (2,9,7) 
it can be easily verified that 

® 22®22 " ® 22 ® 22 ‘ 

This completes the proof of the lemma. 


REMARK 2.9.1. In order that B 22 be normal, the 
assmption (2.9.1) cannot be dropped in general, as may be 
seen by an example. Consider 

1 1 

with = [ 1 ]. 


Then 


A = 


-111 
-1 -1 1 


V Li - LhL][i 1] = Ll P 


Which is obviously not normal. 


We can now prove the principal result of this section. 


THEOREM 2.9.1. Under the hypotheses of Lemma 2.9.1, 

e[A](, = ©[Aii]^, + ©[B223^, for all 0) <5 fi . 

Proof. The proof is almost along the same lines as in 
Theorem 1 of Haynsworth [57] . Let 

« “-^ 1^2 1 


m being the order of A^^. -By direct computations we have 


C*AC = 



0 " 

®22 
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where L = A 22 - ■^2'^1'^1‘ Because of (2.9.7) » L vanishes. 
Hence, by means of the preceding lemma, C^AC becomes normal. 
Now by applying Corollary 2.4.3, the angularity result stated 
in the theorem follows at once. 


As an immediate consequence of this theorem we have 
COROLLARY 2,9.1. Under the hypotheses of Lemma 2.9.1, 
In ( a ) = In(A^^) + 1^(622) . 


The last mentioned formula shows that the inertia 
of a normal matrix can be determined as the sum of the 
inertias of two lower order normal matrices , Althou^ 
this formula is applicable only to a certain type of normal 
matrices, it is of course, a generalization of Haynsworth' s 
result . 



COROLLARY 2.9.2. If A = 

is nonsingular and normal, 
In(A) = In(A^^) 


^1 hz 

J-21 _ 

then 


is normal where 



Prc^, Since the normality of the given form of A 
includes the condition A^^A ^2 “ corollary 

follows from the previous one. 


COROLLARY 2,9.3. If A is as in the preceding corollary 
with an additional assumption that A 2 ^ = A^^ then 

71 (A) < m-5(A^^) 

'’(a) < m-5(A^^) 


where m is the order of A^j* 
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Note that, by Lemma 2.9.1, -■422A£242_2 
A^^2( *”^11^ ^2 normal. It is given that -4^^ is normal. 
Hence, in view of Remark 2.5.1, we shall apply Theorem 2,5*6, 
so that 

Therefore, from the formula given in Corollary 2,9.2, we have 

it(A) < 

= m - . 

Similarly we can prove the other part. AAA 


We shall now present analogous statements of several 
results in [57], for partitioned normal matrices. By 
appealing to the familiar formula that In(A+A*) = In(A) for 
a normal A, these results can be easily proved. In the 
following theorems and corollaries of this section, by 
In(A) > (a,b,c) we shall mean ^ a, v(A) > b and 

5(a) 2 n as in [57]. 

THEOREM 2.9.2. If (A^^) partitioned 

form of a normal matrix A with A^^. normal and In(A22) = (p»q,0) 
then In(A) 2 (p»q>0) . 


THEOREM 2,9.3. 


If and A^ 


are two normal principal 


submatrices of orders k and m respectively, of a normal 


matrix A and 
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In(Ap = (p|^,qj^,0) 

then 

ln(A) _> (p,q,0) 

where 

p = max(p^,p^) and q = max(qj^,qj^) . 

COROLLARY 2.9.4. If a noimal matrix A has a positive 
stable normal principal submatrix of order p and a negative 
stable normal principal submatrix of order q, then 
In(A) > (p,q,0). 

COROLLARY 2.9,5. If a normal matrix has a diagonal 
element with positive (negative) real part, then it has at 
least one eigenvalue with positive (negative) real part. 

COROLLARY 2,9.6. If A is an nxn normal matrix in the 
partitioned form (A^j) A 22 a^re 

respectively positive and negative stable normal matrices, 
then In(A) = (m,n-m,0) , m being the order of A^p. 

For all the results derived in this section, the 
matrix A and some of its principal submatrices are assumed 
to be normal. Hence these results are well applicable 
to totally normal matrices provided the block matrices meet 
other requirements stated in the theorems and corollaries. 



3. LINEAR TRANSFORMATIONS WITH INVARIANTS 


3.1. Introduction 

Many problems in applied mathematics involve the study 
of transformations, that is the way in v^hich certain input 
data is transformed into an output data. In many situations, 
these transformations are linear and moreover linear algebra 
is essentially the study of linear transformations. 

Another key concept in algebra is the notion of invariance. 
If T is a linear transformation of V into itself then a 
function f defined on V is said to be invariant under the 
operator T if 

f(T(x)) = f(x) for every x s V. (3.1.1) 

In certain situations one might wish to simplify an original 
problem through a transformation but with a restriction 
that a given character of elements in V remain unchanged 
in the process; while in other situations one may wish to 
infer some property of x from that of Tx which may be 
available as a data. Furthermore, while analysing various 
mathematical problems one often canes across relations of type 
(3.1.1). In all such cases structure theorems for T turn 
out to be very useful. This motivates the study of 
determining the structure of all linear operators on 
Mjj, ^ etc. which map certain subsets into themselves 

and at the same time leave certain quantities invariant. 
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In Section 3.2 of this chapter we characterize the 
strucrture of linear operators on having inertia and 
angularity as invariants over the classes of Hermitian and 
normal matrices. Sections 3.3 and 3.4 are devoted to the 
study of matrix transformations which preserve number of sign 

n 

changes (variations) and nondecreasing trend of vectors in JR • 
In Section 3.5, we deal with linear operators on which 
preserve inertia, angularity and number of zero components. 
Using some of the results of Section 3.5, in Section 3.6 we 
determine the structure of linear transformations which map 
circulants into circulants and preserve their inertia and 
angularity. 

3.2. Inertia- and Angularity -preserving Transformations on 
and 

In this section we consider the following three problems 
of characterizing all linear transformations T of into 
such that 

(a) T(Hj^) c and In(T(H)) = In(H) for all H s 

(b) ©[T(N)] = ©[n] for all N s and 

(c) T(N^) c and In(T(N)) = In(N) for all N s 
where and respectively denote the classes of all 
Hermitian and normal matrices in 

As will be seen these characterizations are interdependent 
at least in the sense that the proof in the case (c) depends 
on the characterization in the case (b) and that the proof 
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in the case (h) depends on the characterization in the case (a) . 
Another interesting fact which emerges is that the operators T 
have the same structure in the latter two problems. 

THEOREM 3.2.1. Let T: be a linear transformation. 

Then, for all H s H^, we have T(H) s and In(T(H)) = In(H) 
iff there exists a fixed nonsingular matrix C such that 

T(A) = C*AC for all A s (3.2.1) 

or 

T(a) = C*A’C for all A s . (3 .2,2) 

Proof. The sufficiency follows from Sylvester’s theorem. 

For the necessity, note that T(I) must be positive definite, 

I denoting the identity matrix. Let Q denote the positive 

definite square root of (T(I)) and set S(A) = Q*T(A)Q for 

every A s Then, if K is Hermitian, so is S(K) , and, by 

Sylvester’s theorem In(S(K)) = In(K) , In particiolar, if H is 
Hermitian, 

6(H-\I) = a(S(H-M)) = 5(S(H)-M) for all real\ (3.2.3) 
because H-^ is Hermitian and S(I) = I. But all the 
eigenvalues of H and S(H) are real; so (3.2,3) implies that 
H and S(H) have the same eigenvalues with the same mioltiplicities. 
Now Theorem 1,3.^ of Marcus and Moyls becomes applicable, and 
therefore 

S(A) = U*AU for all A c 
or 

S(A) = U*A'U for all A c 
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for some unitary U, Setting C = UQ“^, Theorem 3.2.1 follows. 

A6A 

Let us now turn to a discussion of linear transformations 


T on mapping normal matrices into themselves and preserving 
inertia for each normal matrix. In the problem of directly 
determining the structure of inertia-preserving transformation 
T,we face some difficulties whereas in the corresponding problem 
of angularity-preserving transformations there is no such 
difficulty. At the same time we are able to determine the 
structure of inertia-preserving transformations from that of 
angularity -preserving transformations, thus justifying our 
remark in the beginning of the previous chapter that working 
with angularity will be easier than with inertia in certain 
situations. Therefore, we shall first characterize the 
angularity-preserving linear transformations on 

THEOREM 3.2.2. Let T: -*■ be a linear transformation. 

Then for all N « we have T(N) « and e[T(N)] = e[N] iff 
there exists a fixed matrix C such that C is a nonzero scalar 
multiple of a unitary matrix and T has the form (3.2.1) or 


( 3 . 2 . 2 ) . 

^g^^gof. The sufficiency is obvious. For, if 
T(A) = kU*AU for all A c or T(A) = kUVU for all A s 


with k positive and U unitary, then T(N) is normal for every 
normal N. Moreover, by Corollary 2.4.3, ©[t(N)] = ©[N] for 


all N « 

Next, for the necessity, let H be Hermitian, and note 
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that: T(H) must he a normal matrix with real spectrum, that is, 
Hermitian. Furthermore In(T(H)) = In(H) , and so, hy Theorem 
3.2.1, there exists a nonsingular matrix C such that T 
either maps all A s into C*AC or into C*A* C. Theorem 2,4.5 
now shows that C is a nonzero multiple of a unitary matrix. 

In order to deduce the structure of inertia-preserving 
linear transformations on from the above theorem we shall 
prove the following theorem, 

THEOREM 3.2,3. Let T: -► be a linear transformation. 

Then T preserves inertia for each N s iff it preserves 
angularity for each N c N^. 

Proof. If T preserves angularity evidently it preserves 
inertia. To prove the converse let us assume that In(T(N)) = 
In(N) for all N <= Nj^, Consider any real number a. For 
N ss <s and hence by making use of the linearity 

of T, In(e^T(N)) = In(T(e^°^N)) = In(e^N) which in turn implies 
that 7 t(e^°^T(N)) = nCe^N) for any arbitrary real a. This 
is equivalent to saying that T(N) and N have equal number of 
eigenvalues in any arbitrary open half plane. Now applying 
the same technique used in the proof of Theorem 2.4.1, it 
follows that e[T(N)] = e[N] for all N = Nn. This completes 

the proof of the theorem. 

remark 3,2.1. In fact, the theorem jnst proved is true 
even if is replaced by any other class in having the 
property that is closed under scalar multiplication. 
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As an immediate consequence of Theorems 3.2.2 and 3.2.3 
we have 

THEOREM 3.2.4. Let Ts M^ be a linear transformation. 

Then for all N cs we have T(N) s and In(T(M)) = In(N) iff 
there exists a fixed matrix C such that C = kU where k/^0 and U 
is a unitary matrix and T has the form (3.2.1) or (3.2.2) . 

3.3. Structure of VP^ and VP^ Matrices 

The concept of variation-diminishing transformations was 
introduced by Schoenberg in 1930 , an exhaustive account of which, 
including a wide range of applications is available in Karlin 
[ll]. Gantmacher and Krein [ 9 ] elaborated rather completely 
the various characteristic forms of matrix variation-diminishing 
transformations. Some particularly interesting papers in this 
connection are due to Gantmacher and Krein, Karlin and MacGregor, 
and P61ya and Schoenberg as cited in Bellman [4, p.3l4]. In 
this context, the structure of variation-preserving linear 
operator L; C[0,l] C[0,l], C[o,l] denoting the space of all 

continuous real valued functions on the closed interval [ 0 ,l], 
has recently been determined by Rathore [ 300 ]. In this 
section and the following one, we propose to study the structure 
of matrix variation-preserving transformations Aj ]R^ -*]R^. 

As will be seen in the next section, the results in the 
discrete case involve some surprises as compared to the 


continuous case. 
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To begin our study, we shall recall certain well-known 

definitions. A = (a. .) <= M (R) is called nonnegative if 

10 n 

> 0 for i,o=l,...,n and we write A 2 0. Similarly 
a vector x = is said to be nonnegative 

if 2 0 for i=l,...,n and we write x > 0. Similarly we 
may conceive of the relations A < 0, x<0, x>0 etc. For 
X, y s3R^, the relation x 2 y is equivalent to the statement 
x-y 2 

Throughout the present section and the next, we use A 2 ^ 
to denote nonnegative matrix A (and not positive semidefinite 
matrix A as used elsewhere) . Moreover all the matrices and 
vectors encountered in these two sections are assumed to be in 
M^CR) and Irrespectively. As usual, ej_ (i»l,...,n) will 
denote the vector whose i-th coordinate is 1 and whose other 
coordinates are all zero. We shall represent the vector 
having all coordinates as 1 by e. Following Marcus and Mine 
[l6], we denote the i-th row of A by A^^ and the o-th column 
of A by A^^^ and it may be noted that Ae^ = A By (Ax)^ 

we mean the i-th coordinate of the vector Ax. If A 2 0 

and Ae»e, that is, every row sum of A is 1, then A is said 
to be row stochastic. 

Here are some more definitions closely related to the 

matrix variation-preserving transformations. 

.T 

definition 3.3.1. The variation v(x) of x = (x^,...,Xj^; 

(= m" is defined as the total number of sign changes in the 
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sequence discarding the zero components in the 

sequence. 

Thus for example, if x = (0,-1, 9, 4,0, -5,0)^, then v(x)=2. 
Note that v(x)=0 iff x > 0 or x < 0. 

DEFINITION 3.3.2. As M^OR) is said to be 
variation-preserving of order m (abbreviated , m being 
a nonnegative integer less than n, if v(x) < m for x 
always Implies 

v(Ax) = v(x) . (3.3.1) 

If (3.3.1) holds for all x s]R^, then A is said to be 
variation-preserving (VP) , 

Obviously VP^^ implies VP^_^ for m=l,2, . . . ,n-l. Of colors e, 
VP and sltb one and the same. 

DEFINITION 3.3.3. x = (x^,...,x^)^ is said to be 

nondecreasing if x^^ < . . . <. x^ and we write x is f. It is 

nonincreasing if x^ 2 • • • 2 sind we write x is By a 

monotonic vector we mean either nondecreasing or nonincreasing 
vector. 

DEFIOTTION 3.3.4. As Mjj(E) is said to be 
monotone-preserving (MP) if Ax is monotonic for all 

. -n^n 

monotonic x sJR . 

DEFINITION 3.3.5. As M^OR) is said to be 
trend-preserving (TRP) if x is T implies Ax is T (oi* 
equivalently x is i implies Ax is i) . It is said to be 
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trend-reversing (TRR) if x is t implies Ax is J, (or 
equivalently x is J, implies Ax is f) . A is said to be 
consistently monotone-preserving (CMP) if it is either 
trend-preserving or trend-reversing. 

Now we shall state some of the results obtained by 
Rathore [300] pertaining to variation-preserving operators 
in the continuous case, which are relevant to our study of 
such transformations in the discrete case. The definitions 
of v(f) , that is, the variation of f c C[0,l], and VP 
are similar to those in the discrete case. However, in the 
continuous case n does not come into the picture. 

THEOREM 3.3.1 [300, Theorem l]. Let^C[0,l] - C[0,l] 
bo linear. Then L is VPq iff L is positive or negative or 
of the form 

L(f|x) = 0(f)lll(x), f c C[0,l], X « [0,1] (3.3.2) 

where 0 is a linear functional on C[0,l] and l|) s; C[0,l] is 
a nonnegative function. 

THEOREM 3.3.2 [3OO, Theorem 2]. For a linear operator 
L: C[0,l] “*• C[0,l3, the following statements are equivalent. 

(a) L is VP, 

(b) L is VPg. 

(c) For some n, C c C[0,l] and satisfying (i) v(n)=0 
(ii) C is monotone onto [0,l] and (iii) n vanishes 
on an interval [a,p] c [0,l] only if 5 is constant 
on Ca,p], there holds 
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L(fjx) = n(x)f(c(x)) (3.3.3) 

for all X s [0,l] and f « C[0i,l]. 

The object of this section is to proi?-e the analogues of 
the above tv/o theorems in the discrete case. 

It has been remarked in [300] that in the continuous case 
that VE^ < ==5> VP for m >_ 2 . Furthermore , it is shown that 
m=2 is the least such integer, by providing an example of a 
linear operator which is VP^ but not VP^. The validity of 
such a remark in the case of ki will be investigated 

in the next section. 

In the following theorem we characterize VP^ matrix 
transfniinations . This result serves as a basic tool in the 
proof of other matrix variation-preserving theorems. 

THEOREM 3.3.3. A = « M^^OR) is VP^ iff A >, 0 or 

A £ 0 or there exist nonzero vectors u, w with u 2 ^ 

ouch that Ax = (w'^x)u for all x 

Proof. The "if" part is obvious. Therefore, let A be 
VE^ but neither nonnegative nor nonpositive. Then we claim 
that theare exist x, y both nonnegative such that Ax 2 0 

while Ay < 0 with neither Ax nor Ay being a null vector. By 
our assumptions, there exist p, q, r, s such that a^^ > 0 
and < 0. Since A is VP^ and > 0 it follows that 

0* Similarly Ae^ < 0. Moreover (Aeq)^ and 
Hence our claim is true. 


are nonzero. 
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Now let x,y be any such two vectors satisfying our claim. 
For c c (O jcxd) j, our choice of x,y shows that cx+y ^0, Hence 
cAx+Ay is nonnegative for sufficiently large values of c and. 
nonpositive for sufficiently small values of c. Define 
Cq = sup{cj cAx+Ay < 0}. Clearly c^ exists and is positive. 
Moreover for e>0. 


(co“e)Ax + Ay <_ 0 

(3.3.4) 

and 


(cQ4-e)Ax +Ay > 0. 

(3.3.5) 

Together (3.3.4) and (3.3.5) imply that 


-sAx < CQAx+Ay < eAx. 

(3.3.6) 

Since £>0 is arbitrary, it follows that 


CqAx + Ay = 0. 

(3.3.7) 

Notice that Cq depends on x and y. Replacing 

Cq by c(x,y) 

and fixing x as Sq in (3.3.7) we find that 


Ay = -c(eq,y)Ae^ 

(3.3.8) 

and hence 


Ay = a(y)u 

(3.3.9) 

with u “ Ae and a(y) being a scalar depending 

q 

take y as eg in (3.3.7) then 

on y. If we 

Ax = - > "r Ae 

c(x,eg) s 


c(e„,e„) 

^ g — §- Ae . 

c(x,eg) q 


Hence, 

Ax * a(x)u. 

(3.3.10) 
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By VPq property of A, any nonnegative z s is of the 
type X (i.e. with Az y 0 and Az f^O) or of the type y 
(i.e. with Az _< 0 and Az ^ 0) or such that Az = 0. Hence 
it is clear that for any z ^ 0 inH^, Az is of the form 
a(z)u for some fixed u 2 0 inB^ with u / 0. Since any 

vector can be expressed as the difference of two nonnegative 
vectors, the very same form given in (3 •3.10) holds for any 
X «=3R . Since Up = (Aeq)p = 0, by equating the 

p-th coordinates of Ax and a(x)u, we have 


A X = a(x) a 

(p) pq 


and hence 


a(x) = (a )“^A, vX. 

pq Cp) 


(3.3.11) 

Now by setting w"^ = (apq)~^A^^^, the theorem follows. 

As an almost immediate consequence of this theorem we 


have tho following well-known corollary. 

corollary 3.3.1. Let A «= M^OR) . Then Ax > 0 for all 
X > 0 inB’^ iff A > 0. 

mm 

In the sequel whUe discussing VP 2 matrices xa MnCR) 
we assume that n 2 3. Similarly for matrices we take 
ni 2. The reason for these restrictions is obvious. 

In order to prove our main theorem of this section, 
require the following two lemmas. 

LIMA 3 . 3 . 1 . Let B he row stochastic and VPg. Then 
B is consistently monotone-preserving (CMP) . 
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Pjiogf, First we shall prove that B is monotone-preserving. 

Without loss of generality, it is sufficient to consider an 

T Xi 

arbitrary nondecreasing vector x = > sind show 

that Bx is monotonic. If x^= x^, then x is a constant 
multiple of e and in this case Bx is trivially monotonic. 

So we assume that x^ < x^. Now 


1 if 


v(x-ce) { Q otherwise. 


c < x^, 


Since Be*e and B is VP^, 


it follows that 


v(Bx-ce) 



1 if < O < Xjj, 

0 otherwise. 


If Bx is not monotonic, it will have at least one set of 
throe coordinates (Bx)^^, k=p,q,r (p<q<r) such that 

(Bx) > M or (Bx) < m 
q T. 


where 


M = max { (Bx) , (Bx)^} 

ir 

m a min { (Bx) , 

Jr 


Now choosing c with M < c < ^ ^ ^ ® 

depending on the situation, we obtain a contradiction that 
v(Bx-ce) 2 2. Hence Bx is monotonic. 


Next, we shall establish that B is CMP» 


If it is not so, 

- -- _ . xT 

then there will exist nondecreasing vectors x x^,...» 

> \ !I? nr "v r^nd. V ^ y STJicli 

and y = \ ^ h ^n 


(Bx), < (Bx) and (By)^^ > (By)^^. 
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Clearly for any c s (0,oo) , cx+y is T. This implies that 
cBx+By is monotonic. Moreover it is nondecreasing for 
sufficiently large values of c and nonincreasing for 
sufficiently small values of c. Let c^ = sup{cj cBx+By is X} 
Then Cq exists and is positive. Evidently, for e>0, 

(Cq~c)Bx + By is X and (cQ+e)Bx + By is f, 
so that for k=l, . . . ,n~l 

((cQ-e)Bx + By)j^ > ((cQ-e)Bx + By)j^^^ (3.3.12) 

and 

((cQ+e)Bx + By)j^ < ((c^^+e)Bx + By)j^^^. (3.3.13) 

From (3.3.12) and (3.3.13) we get 

-e{(Bx)^_^^-(Bx)^} < (c^Bx+By)j^-(c^Bx+By)j^^^ 

< e{ (Bx)^^^-(Bx)j^}. (3.3.14) 

Since e is arbitrary it follows at once that 

CqBx + By = ae (3.3.15) 

where a is a real constant depending on x and y. Indeed, 

Cq also depends on x and y. Let us replace c^ by c(x,y) 
and a by a(x,y) in (3.3.15) so that 

By = a(x,y)e - c(x,y)Bx. (3.3.16) 

In this relation, fixing x as x^ (which could be any 
vector with (x^)^ < (x^)^ and (Bx^)^ < (BXq)^^) we have 

By = a(y)e - P(y)n (3.3.17) 

where a(y) ** c£(xQ,y), P(y) = c(xQ,y) and u = Bxq. Similarly, 
fixing y as y^ (which could be any vector with (yQ)]^ ^ ^^o^n 
and (ByQ)^^ > (By^)^) in ( 3 . 3 . 16 ) and substituting for By^ 
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obtained from (3.3.17) we find after oimpllfioatlons that 
"there exist scalars a(x) and p(x) such "tha't 

Bx = a(x)e - P(x)u. (3.3.18) 

If x^ < x^ with (Bx)^ = (Bx)j^ or if trivially Bx is 

of the above form since Bx is a constant multiple of e. From 
the above arguments it is clear that if x is Tj then in any 
case Bx assumes the form given in (3.3.18). Purthemore, 
it is well knowQ that any real vector can be expressed as 
a difference of two nondecreasing vectors. Herce for any 
X Bx assumes the same form given in (3*3.18) which is 

always monotonic since u is monotonic. This implies that 
for any x c v(Bx) < l which is contrary to the VP 2 property 
of B, since (as n > 3) there are vectors inB^ with variation 
2. This contradiction establishes the CMP property of B.aaa 

LEMMA. 3.3.2. Let B be row stochastic, VP 2 and 


trend-preserving. Then B=I 
PrQOj, We shall first 
has at least one entry as 1. In this process, we also 


PrQOj. We shall first show that each column of B = (bj_^) 


exhibit that b.. 


11 nn 

As Be»e, for a real c, obviously 


1 , Define f. — e— e . for j— l,».,,n, 

u 0 


lifO<c<l, . . 

v(Bf^-ce) = v(f^-oe) = { g otherwise. 0.3.19) 

Since is i, clearly is J,. Moreover, > 0. 

If b / 1» let it be a so that 0 < a < 1. The case a =0 

11 />i\ ■ . . 

leads to the situation « Q and we have 1 = v( 6 ^- 62 ) = 

« v(Be 3 ,-Be 2 ) - v(^^^^) « 0 , a contradiction. lienee 0 < a < 1 . 
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Since (f^-ce) is 'f, (Bf^-ce) is 't'. Now by choosing c such 
that 0 < c < l-a, we see that (Bf^-ce)^ = 1-c-a > 0. 
consequently v(Bf^-ce) = 0 which violates (3.3.19). Hence 
"^21 “ similar analysis it can be easily shown that 


m 


1 . 


Our next claim is that for each d=2,.,.,n-l, b =1 

Id 

for some i=i(j) . Let d *> {2,...,n-l}. It may be observed 


that 


v(Bf^~ce) = v(f^~ce) = { 


2 if 0 < c < 1, 
0 otherwise. 


(3.3.20) 


If b^^ ^ 1 for all i=l,...,n, then 0 < p <_ l, where 
p = min(l-bj_j) . Now for any c satisfying 0 < c < p, we have 
Bf .~ce > 0 and so v(Bf -ce) = 0 which contradicts (3.3.20) 

U J 

and hence for each d = {2,...,n-l}, b.. = 1 for some i=i( j) . 

We have established that each column of B has at least 

one entry as 1. In view of B 2 0 and I b,-^ = n, it easily 

i>d ^ 

follows that each column of B has precisely one nonzero 
entry 1, Since every row sum of B is 1, it further follows 
that B is a pemutation matrix. Finally, by considering 
X == (l,...,n) , we use the trend-preserving property of B 
to conclude that Bal. 


We are now prepared to establish the following main 
result of this section which gives a complete characterization 
of VPo matrix transformations. 
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THEOREM 3.3.4. Let A = (IR) with n 2 3. Then the 
following three st&teinents ere et^^uivelent . 

(i) A is VP^. 

(ii) A is VP. 


( iii) A is a diagonal (or skew-diagonal) matrix having 

only positive or only negative entries in the diagonal 
(or skew-diagonal). That is to say, A has one of 
the four forms 



where > 0, i=l,,..,n. 

Proof, (iii) ==> (ii) ==> (i) is trivial. We shall 
therefore prove that (i) ==> (iii). Assuming (i), we have 
in particular that A is VPq, Hence by Theorem 3.3.3* either 
A20» or A < Of or there exist nonzero vectors u,w (=]R^ 
with 12,0 such that Ax = (w^x)u for all x cH . For the 
last mentioned possibility and as well as for the case A=0, 

we have v(Ax) =» 0 for all x whereas there exists 

n 

X «m (n 2 3) with v(x) = 1 or 2. This contradicts the 
Vp2 property of A. Hence A2 0, orA^O and has at least 

one nonzero entry. Denoting the i-th row sum of A by s^, 

i«l,,.,,n* let us construct a new matrix B as follows. 
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Let xc ]R be arbitrary. For k=l,,..,n define 


(Bx) = { if ^ 

if ®k = °- 


where p is an integer with smallest possible |pl satisfying 
^ 0. If two such p's are available, take the positive 


k~p 


one. 


Thus , whenever Sj^ 0 , the k-th row of B is 


( ^kl ^ ^k2 ^ ^ ^kn \ 

^ s, ’ s, ’ ‘ ' s, ' 
k k k 


and after defining all such rows* a row of B corresponding to 
a vanishing is taken as the one which is nearest to its 
position among the rows already defined for nonvanishing 
row sums. If there are two such nearast rows any one can 

be chosen. However for the sake of definiteness, choose 

the preceding one among the two. 

An immediate observation about B is that B > 0 and is 
row stochastic. Another interesting fact is that B is VP2. 

To SGG this, first let us consider the case A ^ 0, Whenever 

0, for any x , 

sgn((Ax)j^) * sgn((Bx)j^) 

where sgn(a) (read.* signum a), for a real a, is defined as 

1 if a > 0, 

sgn(a:) = { 0 if a = 0, 

-1 if a < 0. 

If . 0, obviously Obc)^ = 0 . on the other hand (Bx)^ 
may admit any signvim but it will be in such a maimer that 
v(Bx) - v(Ax) . This will be clear from the following 

arguments 1 
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(1) If = 0 for k=l,,.,,r (r<n) and ^ 0 then 

all k=l,...,r admit the same signum as that of 

and hence, 

v{ (Bx) ^ , . • . > (Bx) (Bx) = 0 . 

(2) If = 0 for k=m+l,...,n (m > 0) and ^ 
then all (Bx)j^, k=m+l,...,n admit the same signum as that 
of (Bx)^. Hence, 

v{(Bx)^,(Bx)^_j_^,.,.,(Bx)^} = 0. 

(3) If = 0 for k=m+l,...,r where m > 0 and r < n 

with Sjjj ^ 0 and s^^j^ / first [(r-m+l)/2] terms 

in the sequence (Bx)^_j_j, (Bx)^ will have the same 

signum as that of (Bx)jjj and the remaining terms will have 
the signum as that of (Bx)^_^^ where [(r-m+l)/2] denotes the 
integral part of (r-m+l)/2. Hence, 

= v{(Bx)j^,(Bx)^^^}. 

Thus, since the possibilities (l)-(3) are exhaustive, in any 
case the inclusion of coordinates of Bx corresponding to 
vanishing row sums of A does not affect the variation in the 
sequence of coordinates of Bx corresponding to nonvanishing 
row sums. Hence we have v(Bx) = v(Ax) , Since B is the 
same for both A and -A, we have for A < 0, v(Bx) = v(-Ax) 
v(Ax) . Since A is VPg, it now follows that B is also 

Now by Lemma 3 . 3 * 1 , B is CMP. Without loss of 
generality we can assume that B is trend-preserving, 
otherwise, i.e.. if B is trend-re^erstag, we consider Bb in 
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the validity of VP^ <==> vp in +h^ 

2 m the discrete case as in the 

continuous case, now it is quite natural to seek for some 
counterexamples as above in the discrete case also. Therefore, 
we tried to construct some matrices which are VP^ but not VP 2 
and we could not succeed in our attempt. On the contrary, to 
our surprise, we were subsequently able to establish that also 
VP 3 _ implies VP in the discrete case. The proof of this 
interesting result depends on the following basic lemma which 
is a refinement of Lemma 3.3.1 with some weaker hypothesis. 

LEMMA 3.4.1. Let B be row stochastic and VP^, Then 
B is CMP. 

The monotone-preserving property of B follows 
as in the proof of Lemma 3.3.1. Since we are not allowed 
to use ^^2 P3^op®rty, there is no use in continuing further 
as in Lemma 3*3.1. Therefore, let us argue in a different 
manner, 

T 

Consider g = (0,1,. . . ,n-l) . Then 

v(Bg-ce) = v(g-ce) = { ?; 0 < c < n-1, 

0 otherwise. 

(3.4.1) 

From this it is clear that Bg cannot be a multiple of e. 

Let m !» m|.n (Bg)^ and M * m|x (Bg)^. We shall prove that 
m«0 and M*n-1« Since (Bg)j_ ^ 0 for i=l,...,n, obviously 
m 0, If m/O, then for sufficiently small c satisfying 
0 < c < m, we have 'v(Bg-ce) = 0, a contradiction. Hence m=0. 

If M > n-1, then for cen-l, we see that v(Bg-ce) = 1, 
contrary to (3 *4.1). Again if M < n-1, for c with 
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M < c < n-1, we have a contradiction, namely v(Bg-ce) = 0. 
These arguments assert that M=n-1. 

Now, in view of the monotonicity of Bg, it follows 
immediately that either 

(Bg) = 0 and (Bg) = n-l (3.4.2) 

-4 XI 

or 


(Bg) = n-1 and (Bg) = 0, (3.4.3) 

III 

T 

For any x = it can be easily shown that 

"2||xl(^g < x-x^e < 2||x(Lg (3.4.4) 

and 

-2 |}x 11«,( ( n-1) e-g) < x-x^e 1 2 lix m ( n-l) e-g) (3.4,5) 

where 


)|xj|^ = jx^l ^ 


Since B 2 0 and Be=e, it follows from (3.4.4) that 

-2|lxll«pg < Bx-x^e < 2lix||^g. (3.4.6) 


Therefore, if we assume (3.4.2), then by considering the 
first coordinates in the vector relation (3.4.6), we have 


(Bx) 


1 “ 


Similarly by considering the n-th coordinates 


in the vector relation obtained by pre-mult iplying (3.4.5) 


by B, we got (Bx)^ » x^. 

On the other hand if we assume (3.4.3) , then in a similar 


manner it easily follows that (Bx)^ = x^ and (Bx)^ = 

The above arguments hold in particular for any nondecreasing 
vector x. Hence we have shown that x is always implies 
Bx is 'T or always implies Bx is X* ^^ts completes the 


proof of the lemma. 
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We can now state and prove the main theorem of this section. 

THEOREM 3,4.1. As (n > 2) is VP^ iff A is a 

diagonal (or skew-diagonal) matrix having all positive or 
all negative diagonal (or skew— diagonal) entries, 

Pr^. The sufficiency part is immediate. To establish 
the necessity part we use the method of induction. First of 
all, we shall get rid of the case n»2. 

Since, in particular, A is VI^ , by the same arguments 
given in the earlier part of the proof of (i) ==> (iii) in 
Theorem 3,3.4, A is either nonnegative or nonpositive with 
at least one nonzero entry. Hence A is always of the form 

with a, b, c, d 2 0* Consider x = [^-] with p > 0 

so that Ax = i Cep-d^* Since v(x)=l, it follows that 
v(Ax)«1 implying, in either case, that 

(ap-b)(cp-d) < 0. (3.4,7) 

If ac 0, then for sufficiently large values of p, 

( 3 . 4 , 7 ) does not hold* Hence ac=sO. Similarly if bd ^ 0, 
then for sufficiently small values of p, (3.4.7) does not 
hold. Hence bd»0. If a=0 then either c or b being 
zero will lead to a contradiction. Hence if a=0, then c / 0 
and b / 0 which in turn implies that d=0. On the other hand, 
if a / 0, then c»0 which implies that d 0 and hence b=0. 

This establishes the necessity part for the case n=2. 

Before applying the induction process, once again we 
proceed along the same lines as in the proof of (i) ==> (iii) 
in Theorem 3.3.4 to construct the matrix B as explained over 
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there. By the arguments in that proof, B is row stochastic 
and VP^ since A is VP^ and v(Bx) = v(Ax) for all x s]R^. 

Now hy Lemma 3.^«1, we know that B is CMP, As in Theorem 


3.3.4, there is no loss of generality in assuming that B 
is TRP, since otherwise we can consider DB in the place of 
B where D is the skew-diagonal matrix defined by (3.3.21) . 

In Lemma 3.3.2, in fact we have used only VP^ property of B 
to prove that = b^ = 1 and B = assumed to be 

row stochastic and TRP, Hence B can be expressed in the 

Hh 0 n. 1 

partitioned form g] where E c M^_^^0R) and v «:3R . 

For any y = , 

B[°] = [1 g] [°] = 

n— 1 

It shows clearly that E is VP^ onH . Now to begin the 
induction process let us assume that the necessary part of the 
theorem under consideration is true for (n-l) -dimensional case. 
Since the bottom ri^t corner element of E is already 1, E 
cannot be skew-diagonal and hence we must have 


1 0 0 0 0 

“2 ^2 0 ° ° 

0 ... 0 0 

♦ • • • • 

• • • ♦ • 

• # • • ♦ • 

o!n-l ° ° ••• Vl ° 

0 0 0 ... 0 1 


Where Z 0, q^> 0, 1-2 n-l. Now consider x = ce^^-e^ 

with 0 > 0. Clearly v(x)=l. By direct computations. 
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i c 

Bx = 

0 

^ • 

Since c > 0 ^ and ci.^c> 0 for i=3i...,n-l, a necessary 
condition for v(Bx) = 1 is 

^ 2 ^ ” 0,2 ^ (3 •4.8) 


Moreover if any one of ^ is nonzero, by virtue of 

(3.4.8) , v(Bx) > 1. Hence = ... = = 0. If a 2 7 ^ 0 

then c could be chosen large enough so that (3.4.8) is not 
true. Hence a 2 *= 0 . Since Be=e, it follows that B=I. 

Heucc by the same arguments as in Theorem 3.3.4, we conclude 
that A is having any one of the four forms given in the 
statement of the present theorem. We have already shown 
that tho result is true in the 2x2 case. This completes 
the proof of the theorem. ^ 


REMARK 3.4.1. The theorem just proved gives the 
structure of VPj matrix transformations and from this we 
infer that <=»> VP for m=l,...,n-l. In particular, 
Theorem 3.3.4 follows from Theorem 3.4.1. It may also be 
noted that any nonnegative or nonpositive matrix which is 
not of the four forms given in Theorem 3.4.1 is VP^ but 

Thus, in the case of matrix transformations 


falls to be VP^. 
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1 is the smallest integer m such that VP ==> vP, This 
is quite startling in the light of the corresponding result 
in the continuous case in which m=2 is the smallest such 
integer. Also a comparison of the proofs of Theorems 
3 . 3 *4 and 3.4.1 clearly indicates the simplicity and the 
po\\fer of induction arguments, whenever applicable. 

We conclude this section by giving a characterization 
of trend-preserving matrices. ¥e have already used the 
concept of trend-preserving matrices in the study of 
variation-preserving matrices. We recall that A e M^OR) 
is trend-preserving if (Ax)^ 1 ••• 1 for all 

x= satisfying <,..£ x^. 

A special interest in trend-preserving matrices 
arises due to the interpretation of stochastic matrices as 
diffusion processes where it is of natural interest to know 
whether the next stage keeps a trend the same. 


THEOREM 3.4.2. A = (a^^) «= is trend-preserving iff 

^ ^ ^ i,k=l,.. .,n-l (3.4.9) 

j=sl 1+1,0 iO 

and every row of A has the same sum. 

Proof, Let X = (x, , . . . ,x be such that x, < ... < x . 
If A is trend-preserving, then for i=l,...,n-l 

(Ax) i < (Ax) 


i+1* 


Hence, 


" (a.. - < 0. 1=1 n-1. 


0*1 


"iO “ ‘=‘1+1,0' 0 


(3.4.10) 
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Wow setttog X = -(e^+e 2 +...+e^) in ( 3 . 4 . 10 ) for k=l n-1, 

we have (3 *4, 9 ) . Next, we shall put x = -e in (3.4,10) 

SO that 


n 

2 (a. 

0=1 1+1 »d 


a. .) < 0. 
10 


(3.4,11) 


Similarly if we put x = e in (3.4.10), then 

n 

2 (a , - a. .) > 0. 
j=l 1+1 »0 10 

From (3.4.11) and (3.4,12), it follows that 

n n 

.2 a,., . = Z a. , 

0=1 1 + 1,0 0=1 Id 


(3.4.12) 


(3.4.13) 


i.G., every row of A has the same sum. 

Next, WG shall assume that (3.4.9) and (3.4.13) are 
truG. Let 1, .»• £ Simple calculations show that 


n 


(Ax) . T - (Ax). = Z (a.,. . - a..)x. 
1+1 1 d=i 10 0 


n 


which turns out to be nonnegative in view of our assumptions. 
It may bo observed that the last teim vanishes because of 
(3.4,13). The proof is now complete. ^ 


It may be noted that the conditions (3.4.9) are 

equivalent to saying that each one of the (n-l) vectors 

I k»l,...,n-.l, denoting the d-th column of A, 

d^l 

is nonincreasing. 
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An obvious consequence of the above the or an is the following. 

COROLLARY 3.4.1. A s M^OR) is trend-reversing iff 
k 

3=1 ^^i+1,0 " ^i 3 ^ ~ i,k=l, . . .,n-l (3.4.14) 

and every row of A has the same sum. 

3.5. Inertia-, Angularity- and Zero-preserving Transformations 

on £C^. 

In the present section we confine our attention to 
matrix transformations on We shall determine the 

structure of complex matrices preserving inertia, angularity 
and the number of Eero coordinates of vectors in (E^. Amongst 
other problems of a similar nature involving transformations 
mapping into inteelf which have already been solved, we may 
recall the well-known result that A: iC^ 3C^ is norm-preserving 
iff it is unitary [l8, p.230]. 

Wo require, to begin with, the following definitions. 

DEFINITION 3.5,1 (see Bahl and Cain [26]). The inertia 
In(x) of a vector x « is defined as the ordered triple 
(nCx), v{x), d(x)) where the entries in the triple denote 
respectively the number of coordinates in x with positive, 
negative, and zero real parts. In other words, if 
X « (x^, .. . then In(x) = In(D) where D = diag(x^, . . . ,x^) . 

Similarly the angularity ©[x] of x can be identified with 
the angularity of D « diag(x^,...,Xj^) » i.e., ©[x] = ©[d] . 
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In the sequel, e^, e and will have the same meaning 
as used in the previous sections. It may he noted that 
In(e) = (n,0,0) and InCe^) = (l,0,n-l) for 

DEFINITION 3.5.2. A c is said to he inertia-preserving 
if In(Ax) = In(x) for all x s 5)^5 it is said to he 
angularity-preserving if 9 [Ax] = 0[x] for all x e 

It may be observed that A preserves inertia whenever it 
preserves angularity. 

DEFINITION 3 . 5 . 3 . A s is said to be zero-preserving 
if Z(Ax) = Z(x) for every x c where Z(x) denotes the number 
of zero coordinates in the vector x. 

The definitions of permutation matrix, generalized 
permutation matrix (g.p.m.) , and nonnegative g.p.m. are 
already given in Section 1.3 and these will be required in 
the following results. 

The first result of the section characterizes 
inert ia-pror,orving matrices. It will also be of use in 

proving some resiilts in the next section. 

THEOREM 3.5.1. A «= is inertia-preserving iff A is 

a nonnogative generalized permutation matrix. 

The "if part is obvious . On the other hand, 
if In(Ax) « In(x) for every x « then by considering 

X « e. we immediately conclude that each column of A has 
precisely one entry with positive real part and the renaining 
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(n-l) entries are having zero real parts. Similarly by 
considering x = v^~i e^ we see that each entry of 1 ^ 

j=ls...,n has zero real part implying that A is real. 

Combining the above two observations} it follows that 
each column of A has precisely one positive entry, the other 
entries being zero. If any row of A has more than one 
positive entry then at least one row of A will have all zero 
entries so -that (n,0,0) = In(e) = In(Ae) ^ (n,0,0). This 
contradiction completes the proof of the theorom. AAA 

As an immediate consequence of the above theorem, 
we have 

COROLLARY 3*5.1, A ss is angularity-preserving iff 
A is inertia-proserving. 

Also wo have the following corollary. 

COROLLARY 3.5*2. Let A c be such that for each 
X c (E^, there is a permutation matrix such that Ax = 

Then there is a fixed permutation matrix P such that Ax = Px 
for every x « JlP. 

Proof. Prom the hypothesis, it is clear that A 
prosorvos the coordinates of x but in a permuted form. 
Therefore, clearly A is inertia-preserving and hence it is a 
nonnegative g.p.ra,* Since each of the entries of Ae is 1, 
it follows that A is a permutation matrix, AAA 

THEORM 3,5.2, A«M^is zero-preserving iff A is a 
generalized permutation matrix. 
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Pro^. The sufficiency part of the theorem is obvious. 

To prove the other part, we see that Z(A^j)) = Z(Ae .) = Z(e.) 

D 0 

= n-l, showing that each column of A has precisely one nonzero 
entry. If any row of A has more than one nonzero entry, 
then there is at least one row of A having all zero entries. 

In this case, we have the contradiction 0 = Z(e) = Z(Ae) ^ 0. 
The proof is now complete. 


3.6. Inertia- and Angularity-preserving Transformations on 
Cirailants . 


A matrix of the form 


°o °1 ••• Vl 

°n-l % Vz 

• « • 

# • • 

^ C2 ... Cq 


(3.6.1) 


in which each row is obtained from the preceding row by 
shifting the elements cyclically one column to the right 
is called a circulant matrix (see Lancaster [l3, p.267], 
M±x«ky [18, p.432]) or simply a circulant (see Bellman 
[4, p,242], Marcus and Mine [l6, p.66]) . A circulant matrix 
is also known as a cyclic matrix f285]. Since the term 


"circulant" was originally used to refer to the determinant 
of a matrix of the form given in (3.6.1), Good [29l] called 
a circulant matrix by the name circulix. Several authors 
(e.g. Var^ [306]), define a circulant matrix as a matrix 
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of the form 



For some more references, regarding the two definitions of 
circulant matrices, one may refer to the survey paper by 
Roebuck and Barnett [302]. 

Throughout our study in the thesis, by a circulant we 

mean only a matrix of the form given in (3.6.1) and we 

follow Davis [287,288] in denoting this matrix by 

circ(c^,c^,. . . . We denote the class of all nxn 

circulants by C^. In the sequel is used to denote the 

class of all nxn diagonal matrices. F denotes the discrete 

l) (k— iX 

Fourier transform (DFT) matrix defined by F = (n “ 0 , 

;3,k»l,. . . ,n, where exp(2iii/n) with i -sfZ, It may be 
noted that F is unitary and symmetric. In the present 
section P is reserved to denote the special circulant 

circ(0,l,0,.,.,0). 

The interest in the study of circulants stems from 
their applications in various fields. They arise in the 
numerical solution of boundary value problems by boundary 
contraction [298] and are also used to approximate and 
explain the behaviour of Toeplitz matrices which have 
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nuinsrous s.ppliCQ'fcions in infomis.’fcioii "thsory find, ssiinifl'tioii 
"theory C2933 • Pynsinical sysienis involving cipcxilan'ts 
arise in "the study of molecular self -organization and also in 
the evolution of animal behaviour [303], For some more 
applications, especially in solid-state physics and statistics, 
one may refer to the recent paper by Searle [304], Moreover, 
the advent of fast Fourier transform (FFT) methods [5] has 
greatly enhanced the use of circulant approximations [2953 
in various fields. 

Circulants are interesting geometrically and simple to 
handle algebraically [2873. They have many desirable and 
nice properties as listed in Searle [304]. In particular, 
we note that under the usual matrix addition and multiplication, 
Cyj is a commutative ring isomorphic to the ring of diagonal 
matrices [284]. In fact, 

circ(cQ,Cj, . . . ~ F*diag(KQ,\;]L, . . . 

= Fdiag( Vl * • • • 

where 

n— 1 

A, * 2 c ,exp(2jTi:ki/n) , k=0,i, , , . , ,n-l. 

k 3**0 0 

Since F is unitary, it follows that 

a2re the eigenvalues of circ(cQ,C2 , . . . , that circulants 

are normal and further that whenever nonsingular they have 

a circulant inverse. It may also be observed that the 

2 n-1 

eigenvalues of P are l,(o,w,..»> w 
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In this section, we first present a basic theorem 
incorporating various characterizations of circulants, 
collected from different sources. Next, we deteraine the 
set of all nonsingular matrices S such that S*CS is a 
circulant for every circulant C. Finally, based on some 
of the results proved in the preceding section, we characterize 
the structure of the linear transfonnations T; "* that 

(a) T(Cj^) c and In(T(C)) = In(C) for all C s and 

(b) T(CJ c: C and 0[T(C)] = Q[C] for all C « C„, 

The characterizations of circulants are simmarized as 
THEOREM 3 •6,1, Let C c and P and F denote respectively 
circ(0,l,0, , . . ,0) and DFT matrix. Then the following conditions 
are all equivalent. 

(1) C = c„. 

(ii) C is a polynomial of degree < n-1, in P, 

(iii) C is a polynomial in P, 

(iv) ACB s for any nonsingular A and B in C^, 

(v) C commutes with P. 

(vi) FCF* s 

(vii) F*CP « 

(viii) The colxomns of F* are eigenvectors of C. 

(ix) The columns of F are eigenvectors of C. 

(x) f2c(F*) 2 s C^. 

(xi) (F*)^CF^sC^. 
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Proo^. It is well known that circ(cQ,C 2 j...,c^„l) 

^ 3^0 f or r = s (mod n) . Thus, 

(i) ==> (il) ==> (iii) ==> (i) , The equivalence of (i) and 
(iv) follows from the closure property of under usual 
multiplication. The proof of (i) <==> (v) is given in 
Ablow and Brenner [279j and also in Brenner [283], The 
equivalence of (i) , (vi) and (viii) is part of the substance of 
Chalkley's paper [284], 

A quick way of seeing that (i) <==> (vii) is to 
T 

consider C which is a circulant if C is so [308]. Now 
(i) <==> FCF s <==> (vii) since F^=P, Clearly (vii) 

<==> (ix) . It is easy to see that (x) <==> (i) <==> (xi) . 

In fact, if C c then F^C(F*)^ = (F*)2 cf2 = C^, AAA 

In Theorem 2.4.5, we characterized the class of all 
nonsingular matrices C such that C*AC is normal for every 
normal A, Now we shall consider a similar problem for 
circulants. For this purpose, we shall first prove three 
simple lemmas. 

LEMMA 3.6.1. If S s M^ is nonsingular and S”^S s 
then S*S s 

Since S”^PS s C^, F(S ^PS)F* = D «= Dj^. This 

shows that the columns of SF are eigenvectors of P. Since 

the eigenvalues of P are all distinct and the eigenvectors of P 
are columns of -F also, it follows that SF* = F^Q for some 
g.p.m. Q, A simple calculation now reveals that F(S*S)F* @ 
and hence S*S c 


AAA 
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REMARK 3.6.1. If S*S is a circulant, then S“^PS need 
not be a circulant. It can be easily verified by taking 

s = [_i i], say. 

LEMMA 3.6.2, Let S s be nonsingular. Then S*CS s 

for every C e iff S ^PS <s Cj^. 

Proof. If S ^PS s C then by Lemma 3.6.1, S*S s; C . 

“j 

Hence S P'^S e C for 0=0,1, ... ,n-l which shows that 

CjP^)S <s for any choice of ^^, 0 ^, , . . This, 

by (i)-(ii) of Theorem 3.6,1, proves the "if" part of the 
lemma. To prove the "only if part" we choose C=I and C=P 
to get S~^PS = (S*S)“^(S*PS) s C^. 

LEMMA 3.6,3. Let S s be nonsingular. Then S”‘^PS e 

iff S = F* QF for some g.p.m. Q. 

Proof. The necessity part follows along the proof of 
Lemma 3,6,1, Conversely let S = F*QF for some g.p.m. Q. 

Then F(S*"^PS)B^ = F(F*Q”%) P(F*QF)F* = Q'^SQ c since 
D = FPF e and a g.p.m. can always be expressed as a 

product of a diagonal matrix and a permutation matrix. AilA 

Combining Lanmas 3.6,2 and 3.6.3 we obtain the following 
chara ct eri zat ion , 

THEOREM 3.6.2, Let S c be nonsingular. Then 
S*CS <s Cjj for every C « C^^ iff S « F*QF for some g.p.m. Q. 

REMARK 3.6.2. It may be noted that if any two of the 
matrices S*S, S*PS and S~^PS are in then the remaining 
one is also in C^^ and S*CS cs C^^ for every C s Cj^. 
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We shall now consider the problem of determining the 
structure of any linear transformation T of into 
mapping circulants into themselves and preserving inertia 
of each circulant. In this connection we will require some 
lemmas. 


Let Us be a linear transformation 


LEMMA 3.6|.4. 

satisfying In(U(D)) = In(D) for all D s D^, Then there 


exists a nonnegative g.p.m, Q such that U(D) = QDQ* for a^ 


D e n 


n' 


n 


Proof, Let us consider the linear map Vs defined 


by V(D) = Be. Clearly V is one-to-one and onto and the 

-1 ' -1 

inverse transformation V s iC^ — is defined by V (x) = 

T 

diag(x) where diag(x) for x = (x^,...,x^) is defined as 
diag(x^, . . , ,x^) . Define Ls ffP by L(x) = VUV“^(x) . 

L is linear. Moreover, In(Lx) = In(V”^(Lx)) = In(UV“^(x)) 

= In(V”^(x) ) = In(x) for all x c £C^. Hence by Theorem 3.5.2, 
there exists a nonnegative g.p.m, M such that Lx = Mix for all 
X <s Given a diagonal matrix D, we can choose x <= , 

such that D = V"^(x) . Thus U(D) = ir^LV(D) = V“^(MDe) = 
diag(MDe) , It is not difficult to s ee that diag(QDe) = QDQ 

r\/ 

for a ipermutation matrix Q and a diagonal matrix D . Since 
M can be expressed in the form 0^^ where is a permutation 
matrix and D^ is' diagonal, it follows that U(D) = diag(Q 2 _D 2 De) 
= Cl^(D^D)Q^ = QDQ* where Q = being the positive 

definite square root of D^. This completes the proof. 
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LEMMA 3.6,5. Let S; be a linear transformation 

satisfying In(S(C)) = In(C) for all C e Then there exists 

a nonnegative g.p.m. Q such that S(C) = F*QFCP*Q*F for all 
C «s F being the DFT matrix. 

Proof. Define ¥.» C - by W(C) = FCF* for all C = C . 
Obviously ¥ is a one-to-one linear transformation from onto 
Dj^ and ¥“^(D) = F*DF for all D s D^^. Clearly Us D^^ -* 
defined by U(D) = ¥S¥“^(D) satisfies the hypothesis of the 
preceding lemma. Consequently, for all C s we have 
S(C) = ¥"^U¥(C) = ¥“'^(FCF*) = ¥'"^(QFCF*Q*) = F^OFCP'^Q^F where 

Cis anonnegative g.p.m. The proof of the lemma is complete. 

AAA 

LEMMA 3.6.6. Let Ss M^ -♦ M^ be a linear transformation. 

Then S annihilates each nxn circulant, i.e,, S(C) = 0 for all 

C s iff there exists a linear operator L on M^ such that 

S(a) » L( (FAF*)* ( J-I) ) for all A s (3.6.2) 

J being the matrix whose elements are all 1 and * (other 

. u. 

than in the superscript) denoting the Hadamard prodct. 

Proof. Since FCF* e D for all C e C , the "if" part is 

clear. To prove the converse, due to the linearity of S 

we have 

S(A) = S(P*((FAF*)*(J-I))F) + S(f''((FAF*)*I)F). (3.6.3) 

The second term on the right hand side of (3.6.3) vanishes 
since S(C) = C for all C s Cj^. Hence the result follows 
by choosing L as the linear operator defined by 

L(X) = S(F^XF) for all X s= M^. AAA 
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We are now in a position to prove our main result 
concerning the structure of inertia-preserving linear 
transformations on circixLants. 

THEOREM 3.6.3. Let Ts ^ ^ Le a linear transfonnation. 

Then for all C s T(C) & and In(T(C)) = In(C) iff there 
exist a nonnegative generalized permutation matrix Q and a linear 
operator L on such that 

T(A) = R(A) + S(A) for all A « (3.6.4) 

where 

R(A) = f^QFAP^Q^ 

and 

S(A) = L((FAF*)#(J-I)). 

Proof. To prove the sufficiency we note from Lemma 5.6,6 
that S(C) =0 implying T(C) = R(C) for all C c C^. By using 
the characterization mentioned in Theorem 3.6.l(vi) it 
is easy to see that R(C) e for any circulant C, Now by 
applying Corollary 2.4.3, we have In(R(C)) = In(C) , Hence 
T(C) c and In(T(C)) = In(C) for all C c 

To prove the necessity, we infer from Lemma 3.6,5 that 
there exists a nonnegative g.p.m. Q such that T(C) = R(C) 
for all C c C^, The transformation T: defined by 

T(A) = T(A) - R(A) is linear and annihilates circulants. 

Hence Lemma 3.6.6 is applicable and we have our main theorem. 
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remark 3.6,3. By virtue of Corollary 3.5,1, it follows 
that T will have the same structure if it preserves angularity 
instead of inertia in the formulation given in the above 
theorem. The same conclusion can be drawn also from 
Theorem 3.2,3 in the light of the remark made just after that 
theorem, because is closed under scalar multiplication. 



4. ITERATIVE SOLUTIONS OF THE LYAPUNOV AND SYLVESTER EQUATIONS 
4.1. Introduction 


In this chapter we discuss some iterative methods for 
solving the Lyapunov matrix equation and the more general 
Sylvester matrix equation. The methods considered for the 
Lyapunov matrix equation are motivated by the clas’sical 
Newton' s method for determining the sign lunction of a matrix 
A with 6(A) =0. It has been discussed in Section 1,2 that in 
the Nev/ton method, one generates a sequence of matrices (V 
defined through 

\+l (4.1.1) 


with Here if A is positive stable, A^^ -*■ I as k “*■ oo. 

In other words, for solving the Lyapimov matrix equation 



AX + XA* = C, 

(4.1.2) 

where 

A is positive stable, if we define 



Vh- 1 = “k^k " Pk^^'^k^*' =0=° 

(4.1.3) 

where 

A = a A + P A”^, A =A 

k+1 k k k k * 0 

(4.1.4) 

and 

“k = Pk = 

(4.1.5) 

then, 

inductively, we have 




(4.1.6) 

and it 

follows that 



X = (ijlfoo C^)/2 

(4.1.7) 

is the 

solution of the system (4,1,2), In the above 
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definition of A, ^ , each of the matrices A, and A, ^ are given 
the equal weight 1/2. In this context, as described in 
Section 1.4 of the survey chapter, Hoskins, Meek and Walton 
[205,207,208] and Barraud [163] developed several other 
choices of and to improve the performance of the method. 

In Section 4.2 of this chapter we reconsider a choice of 
a, and p, given by Hoskins, Meek and Walton £207]. Several 

xC K, 

counter examples suggest that the algorithm requires some 
modifications. Even with these modifications it is shown 
that the method does not converge for all stable (positive 
stable) matrices A, For the modified method, however, we 
are able to establish the theoretical convergence of the 
algorithm, in case A or —A is stable with a real spectrum. 

In Section 4.3 we develop a new choice of and and 
establish the theoretical convergence of the method so obtained 
for the class of normal matrices A. In the last two sections 
of this chapter we study the feasibility of the Kaczmarz and 
the residual projection methods for solving the Sylvester 
equation AX+XB=C. There our main object is to show how 
these projection methods could be compactly implemented for 
solving the Sylvester equation without giving rise to 
significant storage problems. Also, our numerical 
experience has shown that in certain cases the implementation 
of these methods is rather fast. 
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4.2. Convergence of an Algorithm of Hoskins, Meek and Walton 


The key idea behind the class of iterative methods 

generalizing Newton’ s method for solving the Lyapunov matrix 

equation (4.1.2) is to choose a, and p, so that the iterative 

k "^k 

scheme (4,1,4) converges to I (or -I) more effectively when 
A is positive (or negative) stable. In the choice of a , P 

xC xC 

suggested by Hoskins, Meek and Walton as described in (1.4.49)- 
(1.4,52) , for solving AX+XB=C where A and B are stable, if we 
specialize B=A^ or A* we get the choice 

cc^ = an<i (4.2.1) 

in the iterative scheme (4,1.4) for a stable matrix A, 
where a (X) and P (X) for X <s are defined by 

a(X) = a(X)/Y(X) 

P(X) = p(X)/Y(X) 

a(X) = tr(X)tr(X“2) « ntr(X“l) 

P(X) = tr(X”^)tr(x2) - ntr(X) 

and 

Y(X) = tr(X^)tr(X”^) - n^. 


(4.2.2) 

(4.2.3) 

(4.2.4) 

(4.2.5) 

(4.2.6) 


Regarding the above choice the following may be observed; 


(a) Referring to Walton [274], Hoskins et al. [207] 
state that 

\ = -I. (*•2-7) 

■»0 S 0 

However, if we consider A = [“ q then we have 

°‘'o ~ ^o ~ ”■0,4 and thus A = [q Hence it appears that 

Aj^ tends to I rather than to -I as k -* 


00 ♦ 
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(b) Even if this probable error in the sign with I is 
ratified in (4.2,7) » still we find that the result is not 
valid for a general stable matrix A. For example, if we 
consider 

” -1 4-V33 0 0 " 

- 1-10 0 

A = , ; 

0 0 ~4 yT3-i 

0 0 - 1-2 

then straightforward theoretical calculations show that the 

1/2 1/2 

eigenvalues of A are -lii(4-^/l3) ' and -3il(i/l3-2) ^ , 

Also tr(A) = -8, tr(A^) = l6, tr(A"’^) = -2 and tr(A = 
2(7-yi3)/9. Thus = -1/2, = 0 and = 1, = 0 for 

k=l,2,. establishing that 

T™ A-i, = -A/2 (4.2.8) 

k - oo ^ 

which is neither -I nor I. 

(c) Moreover, it is important to note that the 
coefficients and ^ given by (4.2.1) are not well defined 
in all cases. For, there may arise a situation when Y(Aj^)=0 
so that and are to be redefined. Two such non-trivial 
examples for which Y(Aj^) vanishes in the execution of the 
algorithm using exact arithmetic, respectively, for k=0 and 
k=2 are as followss 


n/3-2 

0 

0 

0" 



-0.5 

0 

0 

0 

d" 

0 

-1 

0 

0 

and 


0 

-2 

0 

0 

0 

0 

0 

-2 

-1 

A = 

0 

0 

-1 

-2 

0 

0 

0 

2 

0 



0 

0 

0 

-1 

-2 

— 



•J 



0 

0 

0 

0 

-1 
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Keeping the above observations in mind, we propose the 
following modification in the above choice of and 

“k = ^ ° (4.2.9) 

and 

= n/2tr(Aj^) , = tr(Aj^)/2n if Y (Aj^>=0. (4.2.10) 

In fact, only after developing the above choice 
independently, we came to know the choice of aj^ and 
defined through (l. 4, 49) -(1.4, 52) in Hoskins et al.[207]. 
However, in [207] it is not explained how this choice was 
obtained. In this section first we explain how we developed 
the choice of and given in (4,2.9) -(4.2.10) and then 
We present a theoretical proof of the convergence of Aj^ to I 
in the iterative scheme defined by (4,1.4) with the choice 
of given in (4.2.9) -(4,2.10) , when the eigenvalues of 

A s are all positive or all negative. 

Now we introduce some notation to be used in this section 
and the following one. We denote the class of all positive 
(negative) stable matrices in by (Sp , By SR^ (SRp 
we mean the set of all positive (negative) stable matrices 

rs/ 

in M with real spectrum. denotes the set of all nxn 

r^(X) 

matrices with n repeated eigenvalues and will refer to 

the set of all nxn matrices all whose eigenvalues are K 
SR^ refers to (1 SR^. Similarly SR^ may be defined. In 
all the analysis the sequences are defined on IN, the set of 
nonnegative integers. 
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The criterion for the choice of given in (4.2.9) 

is the minimization of the error functional defined by 

e(Y) = tr{(Y-l)^}, Y g . (4.2.11) 

The minimization of e(Y) is intended to take care of the 
convergence of the diagonal part of k-^ to I in the iterative 
scheme (4.1.4) . Since 

e(Aj^^l) = (4.2.12) 

evidently the two nomal equations involved in the minimization 
process are 

^ ^k^ ^ tr(Aj^) (4.2.13) 

“k^ + ~ tr(AJ^^) . (4.2,l4) 

The above system possesses a unique solution iff 'V’(A^) 0, 

and the xmique solution is given by (4,2.9) . Whereas if 

=0, the two normal equations represent one and the same 
equation and hence and can be chosen in infinitely many 
ways. However, for the sake of definiteness, we choose aj^ aud 
p as in (4.2.10), As our interest is to prove the convergence 
of (4.1.4) with the choice of as given in (4,2.9) -(4.2,10), 

in the rest of this section it is assiimed that A <s SR^* 

(The convergence in the case A s will easily follow from 
the corresponding result for A s SR^.) It may be verified, 
in view of Lemmas 4,2,1 and 4.2,2 to follow, that the choice 
of pj^ given in (4.2.10) satisfies both the normal equations. 

Now we shall list some observations of theoretical interest 
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related to the iterative scheme given by (4,1,4), (4.2.9) and 
(4.2.10) . 

LEMMA 4.2.1. Let A g SRJ. Then a(A) > 0, ff(A) > 0 and 
Y(A) > 0, Moreover, the equality holds in each case iff 
A s 

Pra^, By applying the power mean inequality [l6, p.l05] 
it can be easily shown that 


tr(A~^) ^ tr(A~~^) ^ ■ n. > tr,(A}. 

tr(A'“^) ” n "" tr(A) tr(A^) 

with the equality holding everywhere iff A « 

completes the proof. 


(4.2.15) 

which 

AAL 


LEMMA 4.2.2. Let A s SR^. Then in the Iterative 
scheme defined by (4.1.4), (4.2,9) and (4.2.10), Aj^ g SR^^, 
for each Mg TM, 

Proof. The proof is by induction. Suppose Aj^ g SR^\S^. 
Then by the previous lemma ^ ^ hence 

G SR^. On the other hand, if A^^ g SR^, then Y(Aj^)= 0 
and hence “ (n A^(Aj^)/2tr(A^)) + (tr(Aj^)/2nAj^(Aj^) ) = 1 

for i=l, . . . ,n where A^(y) , . . . , A^(Y) are the eigenvalues of Y. 
Thus we have shown that if Aj^ g SR^ then <= SR^, Since 

A^=A G SR^ the lemma follows . 

The following lemma will be usefiiL in proving our main 


theorem . 

LEMMA 4.2.3. If A g SRJ^\Sj^ and ^^ = aA + pA"*^ where 
a = a (A) and p = p(A), then e(A) is a continuous function of 


the eigenvalues of A. 
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Proof. Since A s SR'^\'S , we have Y(A) ^ 0 and certainly 

n\ n' - ' . 

a and P are continuous functions of the eigenvalues of A. 

Hence e(A) = tr{(^-I)^} = 2 (\‘(A)-1)^ = 2 {a^. (A) +(p/?^ . (A) ) -l}^ 

is a continuous function of the eigenvalues of A, This 
completes the proof. 

It is convenient to denote e(Aj^) hy e^^. It may be 
observed from the way in which and are obtained that 
{e^^} is a monotonic nonincreasing sequence bounded below by zero 
and hence it converges to a nonnegative number. The following 
lemma is a consequence of this fact and the two normal equations 
(4.2.13) and (4.2.14). 

LEMMA 4.2,4. The iterative scheme defined by (4,1,4) , 
(4.2.9) and (4.2.10) has the following properties. 

(i) ■tr(Aj^Aj^_j_^) = tr(A^) for all k 

(ii) tr(i^A^_^^) = tr(A^^) for all k sU. 

(iii) e^ = tr(I-Aj^) for k=l,2,,... 

(iv) e^ £ n. 

(v) tr(A^) = tr(A^) for k=l,2,.... 

(vi) tr(A^) < tr(A 2 ) £ ••• £ tr(Aj^) < . . . £ n. 

(vii) tr(A^^) 2 tr(A^^) 2 ^ ^" 0 ^ k=l,2,,... 

(viii) S. 1 k=l,2,,... 

(ix) a P, 2 1/4 for all k cU, 

K. JK. 

(x) 2 (■tr(A^)}/n for k=l,2,,.,. 
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= a^A^+ we have tr(A^i^^^) 

= a^trC^) f = tr(y . 

(ii) Its proof is similar to that of (i) . 

(iii) ¥e know expanding 

the ri^t hand side and simplifying with the help of normal 
equations), we arrive at each k cUNT, i.e., 

e^^ = tr(l-A^) for k=lj2,.... 

(iv) = tr{ (a^A^+P^A”^-!)^} < tr{(-I)^} = n, since a^, 
minimize e(A^) . 

2 

(v) Since tr{ (Aj^-I) } = ej^ = tr(I--Aj^) for k=l,2,...s, we 
2 

have tr(Aj^) = tr(A^) for k=l,2,,.., 

(vi) It is a consequence of the monotonicity of {ej^} and (iiO. 

0 and tr(Aj^) £ n we conclude that 
tr(Aj^^) 2 tr(Aj^^) . From 2 ^ snd (v) we have tr(A^^) > n. 

(viii) It follows immediately fron the first normal equation 
by using (vi) . 

(ix) From the second normal equation, by the well-known 
arithmetic-geometric mean inequality [l6, p.l06], we have 
{tr(Aj^^)}^ > 4naj^Pj^tr(A^^) . But ntr(A^^) > {tr(A^^)}^ (refer 
Lemma 4,2.1) for each k e3M. This proves (ix). 

(x) From the first normal equation and (v) we have 
tr(Aj^) = “^■^^(Aj^) + Pj^n. Now by making use of the fact that 
tr(A^) <, n, the result follows. 

The proof of the lemma is now complete. -AAA 


(vii) Since a(Aj^) > 
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In view of the above properties we have 
LEMMA 4.2,5. For k=l 52 ,... the following conditions, 
in connection with (4.1.4), (4.2.9) and (4.2.1C^ , are all 
equivalent to one anotheri 

( i) 6j^ = 0 , 

(ii) = ej^. 

(iii) tr(Aj^^^) = tr(Aj^) . 

(iv) s SR^. 

(v) A^ c . 

(vi) tr(A^) = no 

(vii) tr(A^) = n, 

(viii) tr(A^^) = n, 

(ix) tr(Aj^^) = n. 

(x) tr(Aj^) « tr(Aj^^) . 

(xi) tr(AjJ^) = tr(Aj^^) , 

(xii) = n. 

(xiii) tr(Aj^^Aj^^.^) = n. 

(xiv) a, +p, =1. 
k k 

Proof. (vi) <==> (vii), (vi) <==> (xii) and (viii) <==> 
(xiii) are all obvious, 

(v) ==> (iv) trivially. Conversely if (iv) holds then 
using tr(A^) = tr(A]^) we find that A^^ s 4==> (v). 

It is clear that (vi) ==> (i) ==> (v) ==> each one of 
(viii) to (xi) . In view of tr(AJ^^) >, tr(A^^) > n, (ix) ==> 
(viii) , From the inequalities tr(Aj^)tr(A^^) > n^ and 
tr(A^) < n, (viii) ==> (vi) and (x) ==> (vi) . Because of 
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2 0 and tr(A|^) < n, (xi) ==> (vi) . Thiis we have shown 
that (i) and (iv) to (xiii) are all equivalent to one another. 
Moreover (ii) <==> (iii) is immediate. 

To complete the proof it suffices to show that (ii) <==> 

(iv) and (vi) <==> (xiv) . If = e^, then aj^ = 1, = 0 

is a choice of the parameters. Hence Aj^ should belong to 

'§R^ (refer the proof of Lemma 4.2.1) . Hence (ii) ==> (iv) ==> 

(v) ==> (ii) . In view of the first normal equation, it 
easily follows that (vi) ==> (xiv) . On the other hand, if 

= 1, then again from the first normal equation we get 
= 1 or tr(Aj^) = n. If aj^ = 1 then = 0 which in turn 
implies that A^ <= SR^ and hence (vi) . This completes the 
proof of the lemma. 

An immediate consequence of the above lemma is 
COROLLARY 4.2.1. For k=l,2,..., if A^ s SR^\;^then 

(ii) tr(Aj^) < n < tr(Aj^) < trCAj^) . 

(iii) 0 < a +P <1. 

xC K 

Next we prove another simple lemma which will play a role 
in the proof of the main theorem. 

LEMMA 4.2.6. If for s SR+ then aj^ = = 1/2 for 

all k>kQ. In fact, it holds even for k=kQ except possibly when 

k =0. 
o 

Frcm Lemma 4.2,5 » we have «s SR^ ==> A^^ «s SR^ 
for k=l,2,... implying that V(Aj^) = 0. Hence by using (4.2.10), 
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we get a-^ - - 1/2, Moreover, if A,, c for k=k 

■tv 11 O 

then it holds for k > k^ also. This completes the proof .AAA 

Finally, we state one lemma due to Laasonen [297] which is 
an essential tool used in the proof of the main convergence 
theorem. 


LEMMA 


4.2,7. Let and {cj^} denote three 


sequences of real numbers connected by a^^^^ = c^a^^ ^k* 

{b^} and { have limits b and 0, respectively, the sequence 
{a^^} converges to the limit b. 


Now We prove the following main convergence theorem 
related to our modified algorithm for solving the Lyapunov 
matrix equation (4.1.2) assuming that A e SR^. 

THEOREM 4.2,1. Let 

\+l = “k\ ^k^^ ^ 

with A^ s SR^, where 

“k " ^k "" -^K ® if ^ 

and 

= n/2tr(A^), = tr(Aj^)/2n if Aj^ e SR^ (i.e. if Y(Aj^)=0i 

Then as k “k Pk "* -^k 

Proof. We shall establish this result in four stages. 

Stage 1 . Suppose for some k, say kg, Aj^ c SR^ . Then 
the proof is very easy. By Lemma 4.2.6, ccj^ = = 1/2 for all 

k > k^ and hence it remains to prove only that - I. In 

order to prove this, it is sufficient to establish the 
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convergence of the classical iterative scheme 

\+l = s SR^. (4.2.16) 
Hoskins, Meek and Walton have given a proof for this in [2O5]. 
However, the following direct, alternative proof may be given. 
From (4.2,16) , by a simple calculation we have 




If we put (Aj^-I) (Aj^+I) as Ej^, then it follows that 

2k 

^ • 

Since A^ g SR^, ^E^) < 1 and hence 


(4.2.18) 


E, = 0. (4.2.19) 

k 00 ^ 

As Aj^ = (I-Ej^) "^(I+Ej^) , clearly A^^ -* I as k -* 00. 

We remark that the above proof holds even if A^^ g S^, 
Stage 2 . We shall now consider the case when for all 
k gIN, Aj^ g SR^y§j^, so that aj^ = a(Aj^) and = P(Aj^) for all 

k gIH, It will be convenient to express the iterative 
scheme (4.1.4) using Jordan canonical form of the initial 
matrix A^, If = T”^AqT is the Jordan form of A^, from 
(4.1.4) , equivalently, 

= cc^Jj^ + = T“^Aj^T for every k gJH (4,2 .20) 

with J^ G SR^ and “ P('^k) • Note that Jj^ need 

not be the Jordan form of A^^ for k=l,2,,... However, Jj^ is 
an upper triangular matrix having the eigenvalues of Aj^ along 
its main diagonal. In fact, the convergence of Jj^ to I will 
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imply the convergence of Aj^ to I and vice versa. Note that 
ej^ = e(A|^) = e(Jj^) . Let us prove at this stage that -*• 0 
as k -*■ oo. 

The boundedness of {ej^} indicates that each one of the n 
sequences i=l,...,n is bounded. Hence by Cantor's 

diagonal process there exists a subsequence {k } of nonnegative 

Jr 

integers where < k2 ^ ... such that ^i 

as p -*• oo for each i=l,...,n. Since a sequence of positive 

numbers can converge only to a nonegative number, we have one 


of the following situations. 

(i) for all i=l,...,n. 

(ii) >-.=0 for at least one i and at most (n-l) indices 
1 

i s {l, .. .,n}. 


(iii) > 0 for all i=l,...,n and for at least one 

pair (i,3) . 

(iv) For some X > 0, \ for all i=l,...,n. 

Ass um e , eventually to reach a contradiction that = 0, 

for all i=*l,,..,n. Since e^ £ n, it follows that 62 n. 

Hence there exists some q s ]N such that ei^ ^ n and evidently 

q 

for some c ^ 0, e-i ^ n*“C. Moreover, from the assumption, 

Kn 


Therefore, it is true that for any given c > 0, 


ej, = n. 

p-00 ^p 

jej^-n| < c/2 holds for all sufficiently large values of p. 

Hence for some r > q, ej^ > n-(c/2). Since } is a monotonic 

3? ]P 

decreasing sequence, we have ej^ < which shows that 
n-(c/2) < n-c and we have a contradiction. 
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Next, suppose S = . ,i^} , where 0 £ m < n-1, to be 

the set of indices i for which vanishes. Clearly, without 
loss of generality, we can assume that along a subsequence 
{k } of the subsequence {k }, 

±5? P 


lim 

P ^ oo 


^i (Jkn ) 
o l^P 

A. (J ) 
1. k_ 
a i»p 


c , c. <= Eo,oo) for 

J 0 

O=0,l,...,m. (4.2.21) 


To simplify the notation let us put 4 ) = A . 

d i.p ). 

Let » . . . , ^ be the nonvanishing A!s. Note 

1 2 n-m-1 1 

that there is at least one nonvanishing A^. It is not 
hard to see that 


)"^{S^)(l+Sc?) - n(\i )"^(H-Ecj) 

2 ^ 0 (4.2.22) 

(\ )"2(Z^if)(l+Zc2) - n2 
■^0^0 


where in the summations i varies from 1 to n-m-1 and j varies 
from 1 to m. Simplifying further, we see that 


a, ^ 

^1,P ’ 24 


HP. 

^ > 0. 


Hence 


lim ■ 

P ^ oo K' 


tti, > 0. 

i»P 


(4.2.23) 


(4.2.24) 


Since 


^1»P ^ ^1»P O - o 


X-r + Pv (^1 ) 


I.P 


and a, ■ is bounded we have 

^l.P 


(4.2.25) 
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O IsP ^ l»p O 


(4.2.26) 


On simplification 5 the limit on the ri^t hand side becomes 
2 

(l+i:cj)/(l+Ecp which is nonzero. In particxolar it follows 
that 


= 0 . 


(4.2.27) 


If i is such that?c 0, using (4.2.23) and (4.2,27) it follows 
that 


H-l> > 0- 

p ^ oo 1 


(4.2.28) 


Hence the set of j’s for which 


From the 


4 (J . J = 0 (4.2,29) 

P ~ ^ ^l,p ^ 

has at most (n-2) elements, Rit ICt +1 = . From the 

1»P P 

f (l)i 

above analysis we conclude that along the subsequence t^p ; 
of nonnegative integers, each one of the n sequences 
{^•(J /'T'l)}, i=l,.,.,n converges to nonnegative nimibers and 

i^i; 

among them at most (n-2) limits are zero. Continue the above 

process of constructing subsequences along which each time the 

number of zero limits is decreasing at least by one. Hence 

after at most (n-l) such constructions, we will have a 

subsequence, call it {s }, of nonnegative integers where 

P 

< s^ < S 2 < ... such that )} converges to a positive 

P 

number say for each i=l,,..,n. 

Suppose ^ d^ for at least one pair (i,i) and let 
D = diag(d^, . . . ,d^) . If for each k =1^, denotes the 
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diagonal matrix having the same leading diagonal as that of 
obviously 


lim . 

p -* oo p 


= D, 


(4.2.30) 


Let converge to e. NoW; 


n 


e = e, = )“1}^ = .2_(d,-l) 


P- 


T P- 


£=1‘ ^ s 


i=l 


On the other hand, if we denote e(Dg by 


e = -- e. .. = eCD, 


lim 

p^co "Sp+1 


^ = e(D).(4.2.3l) 


(D ) , then 
s 

P 

(4.2.32) 


Upon using the continuity result exhibited in Lemma 4.2.3 » 
the last mentioned limit becomes e^(D) which is same as 
e(aD+pD”^) where a = a(D) and P = p(D). Since D c 
we know that e(aD+pD~l) < ^( 0 ) , Thus in view of (4.2.31) 


we have a contradiction. 

Still we are left with the case that d^ = d 2 = . • • " '^n " ^ 
sav. Then \(Jc ) ”* d fcr every i=l,...»n and hence for any 

i Dp 

given £>0, there exists such that P > Pq implies 

j\.(j )-.di < e for every i=l,...,n. Thereforej 

* 1 s_ V 


P 

d + e6. 


i,Sp 


for all p > p where each 0^^ g s (-l»l) • Since 


= .2 (a \(Js ) + Ps 

i=l % 1 ®p ®p ^ ^p 


,-l 


(4.2.33) 


where a. and pg are the unique optimal values for 

p P ^ 

to be minimimi, it follows that 


1+1 i*l^ * 


(4.2.34) 
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"1 1 m 

it follows that — c» “ 1/2. By similar computations it 
can be shown that |S^ -» 1/2. 

It may be noted that since -*• I, tr(Aj^) = tr(Dj^) -♦ n 
and tr(Aj^) = tr(Dj^) -* n. Hence once we prove that a.-^ -* 1/2 
then from the first normal equation also it follows that 

Pk - 1/2- 

Stage 4 . Nov/ we have all the required material to 
complete the proof of our main theorem. To prove -* I, 

We follow a procedure used by Laasonen [297] for Newton's 
method. If J_ i=lj... 5 ,r are the elementary Jordan blocks 
of Jq so that Jq = where denotes the direct 

sirni, then we have 

^k+l,i “k'^k,i ^k'^k,i» (4.2.36) 

r 

where = a(Jj^), and Jj^ = ^E^{+)Jj^^^. It should 

be noted that J^^ is an upper triangular Toeplitz matrix [293] 
for k s3Nj i=ls,,..,r. Let for a fixed i> 


J 


k,i 


(k) ^(k) 
1 

(k) 


o 

0 




0 0 
0 0 


a ei 


^(k) ^(k) 

o 1 


0 


d(k) 


which may be denoted by [d^^^ ^ 1 ^^* *•*» ^m-1^ 
convenience. From (4,2.36) we have 
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'^k+l,i “ “k'^k,i ^k^m* (4.2,37) 

By performing the matrix multiplications and equating the 
corresponding elements on both sides we get 

d(^'*'^) s= a, d^^^d^^^ + |3, 

0 0 k o o "^k 

and 


(4.2.38) 


os 1 S— 1 


= Ct + ... + dWaW} 

k o s 1 s-1 « ^ 


s o 

for s*l y . • . ♦ 

Equation (4.2.38) is consistent with the facts a 
p -* 1/2 and d(^) 1. Solving (4.2.39) for d(^'^l) we find 


(4.2.39) 
k * 1/2 


^(k+l) ^ ^2c(. - + (1/d/bp (4.2.40) 

S xC O O 5 O 


(k) ^(k) ^(k+1) 


"s-1* ^1 * 

d^"^^ -» 0 then by Lemma 4.2.7i 


where P is a quadratic polynomial in d^ , . . . , 

.... d//b I£d<*'bd/>, - 

-*0, Writing the equation corresponding to s=l, from 
s 

(4.2.39) we have 

d/*^' = {2a^^ - (d/^^Vd/bldj*"! 


Again by Lemma 4.2.7, 


^u+i;/^u;)}dV^-» (4.2.41) 

o ' o * 1 • 

0. Hence by induction, we have 


11® d^^^ = 0 for s=l,2o , . . ,m— 1. 

k oo s 

Thus we have established that 


(4.2.42) 


lim j _ j 
k -* oo k,i ® 

It is true for i=l,...,r and of course m depends on 
This completes the proof of our main theorem. 


(4.2.43) 

i. 

AAA 
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REMARK 4.2,1, The result of the above theorem remains 
valid even if the assumption A e SR^ is replaced by A « SR^. 
For, as may be easily verified, in this case Aj s SR^. 

"f* mm 

If A «= or then in view of the example provided in 
observation (b) in the beginning of this section, A^^ need not 
converge to I in general. 


REMARK 4,2.2. If A is any nonsingular 2x 2 matrix, then 

A^ = I, i.e., the convergence takes place in a single iteration. 
Si lb 

For, if A = [q (j] where ad-bc^O, it is easily seen that 
whether the eigenvalues and of A or equal or not, 

= l/(^^+^2), pQ = (4.2.44) 


so that 




a 4 . ^1^ 

c d-* XT+XT 




j. d -b. _ 1 o-. 

i>-c a-i “ Lq 


because a+d=X 2 ^+X 2 , 

REMARK 4.2,3. If for some k, Y (Aj^) is sufficiently small, 
the eigenvalues of are almost equal. Therefore from next 
iterate onwards the method starts behaving like Newton’s 
method, i.e., ay. and come close to 1/2. In this situation 
there is little point in working with the formulae given in 
(4.2.9) . Hoskins, Meek and Walton [207] suggest to switch 
over to the formulae given in (l.4,4l)-(l,4,44) . There the 
computations of and are based on some noims of Aj^ and 
However in our simulations we have found that our 
choice given by (4.2,10) compares favourably with the choice 
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of Hoskins et al.[208] and it does not require so much 
computations . 


Finally, we present some numerical simulations which 
provide a comparison of the performance of the iterative 
scheme (4.1.4) for the following four choices of aj^ and 

(1) Newtont s method ~ ^k ~ • 

(2) The choice given in (l,4.4l)-(l,4,44) due to 
Hoskins, Meek and Walton [208], 

(3) The choice given in (r.4. 45) -(1.4.46) due to 
Barraud [l63]. 

(4) Our least squares choice given in (4,2.9)-(4.2.l0) . 

In the following examples the symbols N, H, B, L denote 
respectively the above choices (l)-(4). We present the 
matrix A and the number of • iterations required for these 
choices to have the convergence upto six decimal places. All 
the computations were performed on the DEC 1090 using FORTRAN 
IV with the single precision at the Indian Institute of 
Technology, Kanpur. 


EXAMPLE 4.2.1. 

A ^ f 30.0 32.89 

9.0 10.00 


N*-8 , H— 4 , B--4 , L«— 1 , 
EXAMPLE 4.2.2. 

. ■“ 2.5 

A = 2.0 

1.5 


5.5 

6.0 

-1.5 


2.0 

2.0 

2.0 


N-6, H-4, B-4, L-2. 
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EXAMPLE 4,2.3. 


N-5, H-7, B-7, L-4. 


EXAMPLE 4.2.4. 


f f L»4 « 

EXAMPLE 4.2.5. 


N-6, H-5, B-5, L-3 


0.2 4.0 6.3 -5.4 
1,4 -6.5 6.6 

0 3.2 - 0.6 
5.0 


11111 
1 2 2 2 2 

1 2 3 5 3 
1 2 3 4 4 
1 2 3 4 5 


O 


3 4 5 § 

3 4 5 6 
3 4 5 6 
4 5 6 


EXAMPLE 4.2.6. 


4.5 2.3 1.4 -1.2 3.6 1.4 

2.1 4.1 1.2 -3.5 6.7 

1.2 0.8 1.0 -1.7 

n o. 8 —2*1 1.5 

5.1 - 1.2 


N-5, H-6, B-6, L-5, 
EXAMPLE 4.2.7. 


5^4 O 

4 4 4 3 
3 3 3 3 2 
2 2 2 2 2 1 
111111 


N-7, H-5, B-6, L-4. 
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EXAMPLE 4.2.8. 

A = J^o(l) 

where is defined as the mxm matrix whose diagonal 

elements are all X and the superdiagonal elements are all unity 
and the remaining elements are all zero. 

N— 4 p H— 4 5 B“4 p L— 4 . 

EXAMPLE 4.2.9. 

A = J^(1.8) + 12(0.79) + J^(1.02) + J^(1.55) 

+ J^(2.52) + 1^(4. 36) + 13^(6.21) 

N-6p H-5, B-7 p L-5. 

Frcm the above examples it is clear that the least squares 
choice compares favourably with the remaining three choices. 

4.3. Another Iterative Method for Solving the Lyapunov 

Matrix Equation 

In the algorithm of the previot;is section, the proof of the 
convergence required that the spectrum of A be real. In this 
section we develop another choice of and for which this 
restriction is not necessary. However in the case of 
the algorithm of this section, the assumption that the matrix A 
be normal will be required. Thus the method developed would 
be applicable for an arbitrary positive or negative stable 
normal matrix. 

The criterion for determining aj^ and for the algorithm 
of this section is the minimization of the functional 
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2 

- tr{ } . The corresponding normal 

equations become 

2aj^tr(A*Aj^) + Pktr(A*Aj^+Aj\) = tr(A*+Aj^) (4.3.1) 

and 

(4.3.2) 

In the derivation of these equations, it has been assumed 
that and are to be real. Hence if for X s we define 

Y(X) = 4tr(X^)tr(x“*x“^) - {tr(X*x“^+x”*X) (4.3.3) 

a(X) = a(X)/Y(X) (4.3.4) 

P(X) = p(X)/Y(x) (4.3.5) 

where 

a(X) = 2tr(X*+X)tr(X”*X“^) - tr(X*X’^+X“*X)tr(X"*+X“^) 

(4.3.6) 

?(X) = 2tr(X“*+X“^)tr(X*X) - tr(X*X"'^+X“*X)tr(X*+X) 
then (4,3.7) 

~ ~ provided "''"(Aj^) 0. (4,3.8) 

It may be noted that the expressions of a(X) , j3(X) andY(X) 

in this section are different from those of the preceding sectici. 

If KAj^) =0, as in the previous section, put 

ay. = n/ZtT{A^ and = tr(Aj^)/2n. (4.3.9) 

As remarked before the convergence of the algorithm with the 
present choice of ay^ and will be p3X)ved only under the 
assumption that A s NS^ where NS^ ~ ^n ^^n* If A is not normal 
the method may or may not be applicable as is clear from the 


following two examples. 
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EXAMPLE 4,3.1. With A = simple calculations show 

that A^ = A 2 = ... = I. Hence the method works in this case 
even though A is not normal. 

EXAMPLE 4.3.2. With A = 

tr(A*A) = 651, trCA"^"^) = 219, tr(A*A"^+A“*A) = 438, 
tr(A*+A) = tr(A *+A ^) = 6 and hence = 0 and = 1/73 
It implies that A^ = (l/73)A~^ and consequently that 
Aj^ = ( 1 / 73 ) A ^ for k=2,3,... and hence ihe convergence is to 
the matrix (1/73)A"’^ which is not equal to I. 


1 6 24 

9 i ^ 
0 0 1 


Before stating and proving the main convergence result of 
this section we establish some preliminary lemmas and properties 
of the new algorithm. 

LEMMA 4,3.1. For A <= S^, Y(A) 2 ^ "the equality holds 


iff A is a positive scalar matrix. 


w-1 

^r^of. Writing A = (aj,g) and A” = have 


rsJtr^JrsJT'.jr^ 


Y(A) - 4 E ^ l^rs^^ ^^^Ss^rs‘*'®rs^rs^ ^ (4.3.10) 

where in each summation r ,s run from 1 to n. If we put 

Where i = x^^, y^^ > 0 and 0 < 0^^, 0 ^^ < 2Jt then 


from (4,3.11) 


Y(A) = 4 Sx^gSy^g - M^^syrs^°®^®rs“^rs^ (4.3.12) 


and hence 


Y (A) >4 Ex^ Ey^ - 4(Ex v ^)' 
rs ■'rs rs‘'rs 


(4.3,13) 



182 


with the equality if and only if either 


e 


0, 


pg — f'pg for all r>s— l,,,,yn 


or 


(4.3.14) 

(4.3.15) 

(4.3.16) 


= % for all r,s=l,...,n. 

X'S j7S 

Moreover, by the well-known Cauchy's inequality 

2y^ > (2x y )^ 

rs "^rs — rs-'rs 

with the equality iff 

X = p-y , u> 0 for all r, 3=1, ... ,n. (4.3.17) 
rs ^-^rs’ 

The positivity restriction on in (4.3*17) is due to the 
fact that ypg 2 ^ A / 0. From these arguments 

it is clear that Y(A) 2 ^ snd the equality holds iff in addition 
to (4.3.17) either (4.3.14) or (4.3.15) holds. In other words, 
7 (a) = 0 iff A^ is a positive or negative scalar matrix. 

Since A «5 S^, if we use Jordan canonical form, it is not hard 
to see that Y(A) vanishes iff A is a positive scalar matrix. 

This completes the proof. 

LEMMA 4.3.2. For A c NsJ, a(A) 2 Oj ^(A) 2 0 and in 
either case the equality holds iff A is a positive scalar 
matrix. 

Proof, Let U be the unitary diagonalizer of A so that 


/\jrsjrsJnJr\} 


U*AU = D where D = diag(4^, . . . ,4^) such that Re(A^) = x^. > 0 
and Ini(4^) = y^ for i=l,,..,n. Then 


n 


a 


(a) = a(D) =4.2 


n 




C"^x^+y^ 
G 0 


n n x^ 

« 4 E ^ — i Z 




n 

= 82 


f£l 


^"^(x?+y?)^ 


+ 4 


n (x^+x^){(x.i-x.) +(yf+y^)} 


41 


i»3=l 

i<3 


,2 2v/ 2^ 2^ 

(xpy^) (x.H-y .) 
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Since > 0 and is real for i=l,...,n, it follows that 
a (A) 2 0 and the equality holds iff for every pair 

(i,o) and y^ = 0 for each i. This completes the proof of the 
lemma concerning a (A) . The other part follows immediately 
from the fact that p(A) = aiA"^) and A"^ c NS^^. AAA 

We now state some useful corollaries which are immediate 
consequences of the previous two lemmas, 

COROLLARY 4,3.1. If Ass NS^ and is not a scalar matrix, 
then Y(A) , a (A) and P(A) are all strictly positive. 

It follows from Corollary 4.3.1 that for A « NS^ the 
choice (4.3.9) for aj^ and in case y(Ay.) = 0 also satisfies 
the normal equations (4.3 .1) -(4.3.2) . 

COROLLARY 4.3.2. Over the set of non-scalar NS^ 
matrices A, a (A) and P(A) are continuous functions of A. 

COROLLARY 4,3.3. If A s NS^ and is not a scalar matrix 
and if B = aA+PA"^ where a = a(A) , p = P(A), then |jB-l||^ is 
a continuous function of A. 

LEMMA 4,3.3. In the iterative scheme defined by (4,1,4) , 
(4,3.8) and (4,3.9) if Aq = A <s NS^ , then Aj^ c Nsj for each 
k <=3N. 

Pra^, The proof is by induction. Suppose A^ s 
If Aj^ is not a scalar matrix, in view of Corollary 4.3.1* 

“k* ^k ^ It is clear that 

c NS^. On the other hand, if A^ is a scalar matrix 
then by (4.3.9), A^^^^ = I which is obviously a normal positive 
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+ 

stable matrix. As s NS^ , the lemma follows. AAA 

LEMMA 4,3.4. Let e^^ = 

(a) {e^} is a monotonic nonincreasing sequence bounded 
below by zero and hence it converges to a nonnegative number. 

(b) If, for a certain k, say k^, A^ is a scalar matrix, then 

= I and hence ~ " i/2 for every k > k^. 

(c) If for every k s 3M, A^ is a non-scalar matrix then 

Y(Ak) ^ 0, aj^ and are positive and are given all through 
hy (4,3,8) . Moreover, in this case is a strictly 

decreasing sequence. Also e^ < n. 

This lemma plays an important role in the course of 
proof of our main convergence theorem. Its proof, being 
simple, is omii^ted. 

Some properties of the algorithm of the present section, 
which are analogous to those of the previous section, are 
summarized as follows? 

THEOREM 4.3.1. The iterative scheme defined by (4,1,4), 

(4,3.8) and (4.3.9) has the following properties, given 

A e NS"^. 
n' 

(i) Re{tr(A^A^^^) } = Re{tr(Aj^)}, k«=B. 

(ii) Re{tr(Aj^ -^k+l^ ^ ~ Re{tr(Aj^^) } , k cH. 

(iii) e^^ = Re{tr(l-Aj^) } , k=l,2,,.., 

(iv) Re{tr(A 2 ^)} < Re{tr(A 2 )} . • . £ f^e{tr(Aj^)} ... £ n. 

(v) tr(A^Aj^) = Re{tr(Aj^) } , k=l,2,,... 
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(vii) Re{tr(AjJ^)} > Re{tr(A^A^^) } , k=l,2, 

(viii) Re{tr(A*AJJ^)} < n, k ss3N. 

(ix) > {f^e(tr(Aj^)) }/n, k=l,2,.... 

Proof. (1) From (4.1.4) , 

Considering the real part of the trace on both sides and 
by making use of the first normal equation (4,3.1) > the 
result follows . 

(ii) This can be shown by making use of the second normal 
equation. 

(ii) We know = tr{ (A^_j_^-I) (Aj^_^^-I) } . By expanding 

the right hand side and simplifying with the help of normal 
equations, We arrive at ~ Re{tr(I-A^^ 2 ^) } for each k 

and from this the required result follows. 

(iv) It is a consequence of the monotonicity of and 

the above result, 

(v) Since tr{ (A^-I) (Aj^-I) } = = Re{tr(I-Aj^) } , we have (v) . 

(vi) e^ = •fcr{(a^A*-*-|3^A~''-l)(a^4j-PQA~l-I)} < tr{(-I)(-I)} =n 

♦ 

since a , P minimize e(A_). 

0 0 i 

(vii) This follows from the fact that ^ (Aj^) > 0 and 
property (v) , 

^ '*1 

(viii) Since A^ is normal, we have Re{tr(Aj^AJ^^) } 

= Re{ 
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(ix) By applying property (viii) to the first normal 
equation we have 2Re{tr(A^)} < 2aj^tr(A|jAj^) + 2(3j^n. Since 
tr(AjAj^) = Re{tr(Aj^)} < n, it follows that > {Re(tr(Aj^) ) 

Our next goal is to prove the following main convergence 
theorem. 


THEOREM 4.3.2. Let 

^ \+l "" “k\ , k 

with Aq e NS^ where and are given by (4.3.8) and (4.3.9) . 
Then as k Aj^ -* Ij -*■ l/2 and j3j^ - 1/2. 

Proof, If, for certain k, Aj^ is a scalar matrix, the result 
is obvious because of Lemma 4.3.4. Thus we can assume that 


for every k « M, A^ is not a scalar matrix. ¥e have 
therefore, > 0 for every k s3N. 

As Aq can be reduced to a diagonal matrix by unitary 


transformation, it is clear that there is no loss of generality 


in assuming that for every k cBT, in the algorithm, A^ is 
diagonal. If we write A = diag(k^^^ s . . . » * then 

V u(k) ,2 ^ 

e, = 1 I “1 1 . Hence the boundedness of 167 } implies 

1=1 \ ^ 

that each one of the n sequences , i=l,...,n is bounded. 

Now by Cantor's diagoneuL process there exists a subsequence 


{k } of nonnegative integers where k < < k^ < . . . such that 

P ° (k ) 

^(kp) _ as p - 0 °, for each i=l,,..,n. Since { } 


is a sequence of complex numbers with positive real parts, 


Pe(A^) 2 i=l,,..,n. Further, in the iterative process, 

due to the positive stability of the initial matrix A^ and 
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the positivity of a simple geometric reasoning reveals 

that larg( \ I < )arg( ) I for i=l g , . . gn, where 

argCM denotes the principal argument of X so that -ti < \ < %, 
In view of this, it follows that ReC'Xji^) _> 0 and equality holds 

r\J 

iff Consequently, we have any one and only of the 

following three cases; 

(i) = 0 far all i=l,...,n. 

(ii) = 0 for at least one i and at most (n-l) indices 

i s {l,,,.,n} and Re(\.) > 0 if 0. 

0 0 

(iii) Re(X^) > 0 for all i=l,,..,n. 


If 'X^= 0 for all i=l,,,.,n, by Lemma 4.5.4(c), we have 


« > \ > p- 


oo 

irv>«y 


= n which is a contradiction. 


Next, suppose S={ i^, i^, . . . ,i^} , 1 <, m <, n-l, to be the 

set of indices i for which 'X^ vanishes. Again by using 

Cantor's diagonal process we can always find a subsequence 

{k } of {k } such that for some j <= {l,,,.,m} all the m limits 
l»P P ^ 

lim j ^ q 

P 

Xi l.P 

exist and are bounded by unity in magnitude. We can always 
assume that 1.^=1 and the remaining elements of S to be 
2,5j«..»nif since a permutation transformation does not affect 
the set up of the algorithm. Now let 




c. where |c.| < 1, i=l,...,n. 

(4.3.18) 
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In the following analysis, unless otherwise specified, 
the summations correspond to values of i from 1 to n» Putting 

,(kn ) 

A. j-,p - g i=l,...,n it is easily seen that 

1 IfP r w ^ ^ 

(Reig, - {ReE(g, HReS (f^)} 

■ji2 rUE_ 


ai 


l,p 


»p gj, 


1,P 


g 


1,P 


g 


a,p 


(4.3.19) 


We know that g. -» 0 for i=l,..,,m; (g-, V „) "*■ for 
i,p l,p inP 

i=l,...,mj and (g^ p/gj_ p) 0 ^or i=m+l,...,n. Therefore 
we have 


lim 


m 9 1 4 ^ n 

{l+ 2 |c. 1 } {Re I g. } 
' is2 1 ^P - i=m+l^i»P^ 


a, 

oo k' 


m 


^•p “ Jg, J^] 


■i=m+l i»P 


n ^ 
Re 2 
i=m+l 
n ^ 

s I'k I 

i=m+l ^ 


= A , say. 
2 


It may be noted that «- > 0. By similar computations as above, 
it can be shown that 


lim p 
P ~ ^l,p 


= 0 . 


(4.3.20) 


Hence for i=m+l , . . . ,n, 

lim ^^l,p''’^^ _ lim 




l,p 




which has a positive real part 
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Next We prove that the sequence } possesses 

a convergent subsequence whose limit has a positive real part. 
For this. 


(k-, ^+1) 


X. = a, x'"l.P . J_\, 


) 


P 


(4.3.21) 


J- JrS.- “j 

i,P ”i,p'"i 

The first term on the right hand side of (4.3.21) tends to 
zero. For the second term, we easily verify that 


lim 

p oo 


lim 

p oo 




i.p 


(Vgi,j,)Rer(l/g,^p)E|g,^p| - (l/gi,p)(ReEg.^p) 






lim 

P ->• OO 


g {ReE(l/g )}2|g. ^ {Re2g } 

1»P i»P i»P l,p i,p^ 






In the last expression the second terms of the numerator and 
the denominator tend to zero. The first term in the denominator 
converges to J l\l^(l+ ? I cj 3 since E|g, |^-* I )I|' 

a. m-rx i=:2 i=m+l ^ 

^ 0, to prove that the required limit exists and has a positive 
real part it is therefore sufficient to prove that 


p^— Sl.p 
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exists and has a positive real part. If we now take 

Re(g ) = a and Im(g. ) = b . then 

i,P i,p r.n 


i,p 


P 


"■"“ooMi Re .E (1/g )}= . v 

1=1 X,P p->00 ^fP ^ 

i,p i,p 


Since a . > 0 for 1 = 1 ^ ... 

ipp 


a. 

^ ^ ' q" p - - 

IjP i=l +b 

i,p i,p 


m 

S 




2 2 
a +t) 

1,P l.P 


^1,P^ 

= cos (©i^p) 


where 


®l,p ' ltan-d^}|. 

l.p 


¥e know < ©q < 7t/2, where ©q is the argument of the 

«L y 

eigenvalue * Hence 


m 


Re{g, Re E (l/g. )} > cos (© ) > 0 

T.,p 1=1 i,p o 


( 4 . 3 , 


Now 


m a. 


aT +h 

i,p i,p 


m 


“♦1+ Ejc.l asp-oo. 

i=2 ^ 

n 

Hence the sequence {g- Re .E (l/g )} is bounded and 

l»P 1=1 i»P 

therefore possesses a convergent subsequence. In view of 


22 ) 
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(4,3 « 22) the real part of the limit of the subsequence is 
positive. Thus as has been the case with the proof of 
Theorem 4,2,1, in at most n-1 constructions of such subsequences, 
the case (ii) leads us to a consideration of the case (iii) . 

Thus assuming that Re(^^) > 0, i=l,...,n, we are to 
show that I, or equivalently 0* T4e proof of this 

closely resembles the analysis of the cases (iii) and (iv) of 
the proof of Theorem 4.2.1 and hence is omitted. 

Finally, we are left to prove that aj^, l/2» Since 

1, i=l,...,n, for sufficiently large values k, 

= 1 + e. , where e. , - 0 as k -* for i=l,,,.,n. 
i i,k i,k 

Let us use the notation 


s_ = E e 


l.k 


S-, = E e . 


s = 
2 


i,k 
E 

^ ^i,k 


s^ = E e. 


s_ = E e 


i,k 

e 

i,k i,k 


By straightforward calculations we have 


tr(A|+Aj^) = 2 n . 

tr(A*A^ » n + Si + + Sg 

■tr{A^ ~ — Si — Si + S'2 + ^2 + •••• 

tr(A*AjJ^+A”*A^) = 2 n + S2 + S2 -2S2 + . . . 


and 
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where the neglected terms arfe of third and higher order. 
Therefore, 

a(A^) = Sns^ - 2s| - 2s| - + ... 

pi (A ) = 8ns - 23^ - 2s2 - 4s s + ... 

and 

Y CA^) = I6ns2 - 4s^ - 4s^ - + . . . 

Nov 7 if we put = a^and = b^^for convenience then 

Sns^ - 2s^ - 2s^ - 2(a?+b?) - 8(ra^)^ 

= 8{n2a^ — (Saj^) J + SnSb? 

2 0 

with equality holding iff b.=0 for all i=l, . . . ,n and a. = a. 

^ d 

for all pairs (i, j) . Since A^ is not a scalar matrix this 
will not happen. Hence 8nS2 - 2s| - 2s2 - 4s^f^ > 0. 

Thus a(A^) = a(A^)/Y(Aj^) 1/2. Similarly P(A^) 1/2, 

This completes the proof of the main convergence theorem. -<UA 

remark 4,3.1. The result of the above theorem remains 
valid even if the assumption A <s is replaced by A c 
where NS = N (1 S” . For, as may be easily verified, in 

XI. 



REMARK 4.3.2, If A is Hermitian the algorithms of this 
section and the previous section are one and the same. 

REMARK 4.3.3. In contradistinction to Remark 4,2.2, in 
the case cf the present algorithm the convergence for the case 
of a 2x2 matrix need not be in just one step. This is 
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easily seen by taking A = J], By direct computations 

^o ~ therefore, 

r iUiil 0 1 



Our numerical simulations confirm that the algorithm 
presented in this section is considerably faster. We find 
that the algorithm requires three to five iterations. For 
the following two matrices A c NS^ we verified our method 
and in each case it took only three iterations to achieve an 
accuracy of six decimal places • 


7 

3 

-1 

2 


10 

-1 

5 

1 

1 

0' 

2 

7 

3 

-1 


0 

10 

-1 

5 

1 

1 

-1 

2 

7 

3 


1 

0 

10 

-1 

5 

1 

3 

-1 

2 

7 


1 

1 

0 

10 

-1 

5 






5 

1 

1 

0 

10 

-1 





-1 

5 

1 

1 

0 

10 


For certain nonnormal matrices also we Just tried the 


method. In certain cases the method converges. For 
example, by taking 


A 


1-1 0 0 ” 
110 0 
0074 
0 0-23 


we find that the method converges in four iterations 
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4.4, The Kac2marz Projection Method for Solving AX+XB=C 

The well-established Bartels-Stewart algorithm for 
s olving 

AX + XB = C (4.4.1) 

is a direct method, while in general, iterative methods are 
preferred for solving large systems. The rapidly convergent 
iterative method suggested by Hoskins, Meek and Walton [205] 
as described earlier involves inversions of two matrices in 
each iteration and this method is applicable only when either 
A and B or -A and -B are stable. Moreover, it is well-known 
that the inversion of very large-order or nearly singular 
matrices is rather unwieldy. Furthemore, in the above method 
the original matrices A and B are altered as the computations 
proceed and there is no particular advantage to be gained 
when A and B are sparse, i.e., have a large number of zero 
elements or that the matrices have some systematic arrangement 
in their entries so that they can be easily generated without 
having to be stored in the core. Systems involving 
large-order sparse matrices arise, for example from the finite- 
difference approximations for partial differential equations 
as WG have seen in the introductory chapter. Another 
iterative method 

AX^_^^ = -X^B + C (4.4.2) 

for solving (4.4.1) mentioned in Varah [272] converges iff 
f(B^@'A'”^) <1. In any case, efficient algorithms have 
seldom been reported for singular or inconsistent cases [305] . 
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Keeping the above points in mind, in this and the following 
section we propose two iterative schemes to solve (4,4. l) for ' 
large sparse systems even when the system is singular or 
inconsistent , These schemes are based on our observation 
that the projection (due to Kaczmarz [296]) and the residual 
projection methods (due to Nekrasov [ 299 ] (refer Forsythe [29O] 
and Hestenes and Stein [ 294 ])) could be very efficiently 
implemented forthe matrix equation AX+XB=C, as well as its 
other generalizations mentioned in Chapter 1, For 
definiteness, however, we shall restrict ourselves to the 
case of the Sylvester equation only. The theoretical 
unconditional convergence and the characterization of the 
solution so obtained for these methods for the case of a 
general linear system ibc = b be 2^) have been 

established respectively in Tanabe [305] and Rathore [ 30 l] , 

In either of these methods we generate a sequence of 
matrices which converges for arbitrary A, B, C and X^, 

In both the methods, the limit of the sequence, say 
represents the unique b-olution in case the system is 
nonsingular (i.e. if A and -B do not have a common eigenvalue). 
If the system is singular but consistent, then in both methods, 
Xoo represents some solution of the system. In particular if 
we choose Xq = 0 then the Kaczmarz projection method gives 
the minimum Frobenius norm solution. On the other hand. 
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if the system is inconsistent then ^ obtained by the residual 
projection method corresponds to a least squares solution 
(i.e., X = Zoo minimizes ||C-AX~XB11) . In this context, it may 
be referred that Lovass-Nagy and Powers [233] have suggested 
a method to compute least squares solutions and the method 
relies on a strategy of replacing C by G which is the orthogonal 
projection of C on the range of the operator A. + .6, so that 
the new equation AX+XB=G becomes consistent and then solving 
this consistent system by some direct method. 

Some advantages and disadvantages of the Kaczmarz and the 
residual projection methods are as follows. In both these 
methods the convergence of the process is theoretically 
guaranteed. However as is the case with several first order 
iterative methods, in certain cases the rate of convergence 
could be rather slow. Thus, as a rule, the methods are 
recommend^ only for large systems, where these have a better 
chance of comparing favourably with direct methods. A 
practical strategy would be to first test run the method to see 
if the convergence is quick. If not, one has to introduce 
suitable modifications such as relaxation factors after 
observing the iterates in a known solution case, to accelerate 
the convergence . 

Now we shall formulate the Kaczmarz projection method 
for solving (4,4,l) . As a prelude, we first describe the 
basic projection algorithm [305] for solving the linear system 
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Ax = b, where A = (a ) c M x e and b = (b-,,,..,b^) eC®. 
Here it is assumed that all the rows of A are nonzero. As 
at earlier occasions we use A^^^to denote the o-th column of A 
and A^^^ to denote the i~th row of A, Throughout this section 
and the next one, denote the conjugate 

transpose of and B respectively. In the case of 

vectors, denotes the k-th iterate of x and in the case of 
matrices, denotes the k-th iterate of X. Let 

(x, Af.)) - b. 

f^(x) = X - , i=l,..,m (4.4.3)- 


F(b,x) = f (f (f C..(f (x)) .. ,))) 
1 ^ ^ m 


where 


^ *^(i) ^ ^ 1~1» • • • »roj 


• * 

f.(x) = p,x + -iA,,., i=l,...,m 
1 1 a . C i) ^ 


(4.4.4) 


(4.4.5) 


(u,v) denoting the standard inner product yf-u. It may be 
observed that 


where 

h = I - ai 

I being the nxn identity matrix. Let us define 


(4.4.6) 


(4.4.7) 


Qj_ a PiP2***Pi» i=l,...,m (4.4.8) 

and let us denote by Q. Let Pj and Pj^ respectively denote 
the orthogonal projections onto the subspaces Im A* and Ker A, 
where the symbols Im and Ker stand for the image (range) and 
Kernel (null space) of the corresponding mappings.^ Further, 
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define G as the nxm matrix (I-QPj) R where R is the nxm 
matrix whose i-th column is i=l»...»m, (The 

theory developed in Tanahe [3G5] guarantees that I-QPj is 
invertible) Choose an arbitrary vector x s € and determine 
the Sequence {x from the recurrence relation 

= F(b,x^), k=0,l,2,... (4.4.9) 

Then it has been established by Tanabe [305] through a chain 
of results that 

lim 3^ = p o + Gb. (4.4.10) 

k ^ K 

If the system Ax = b is consistent then the above limit is a 
solution of the system for arbitrary x°. If we choose x = 0 
then the above iterative method provides the minimum norm 
solution. Even if the system is inconsistent, the above limit 
exists. However, that need not be a least squares solution. 

The total number of multiplicative operations involved 
in the above method is {(2s+l)mk+s} where k is the number of 
iterations and s is the number of nonzero elements in A. The 
above algorithm has a simple geometric interpretation. By 
the mapping f^, a vector x e is projected on the hyperplane 
defined by A^^x = b.. Then F(b,x) is obtained from x, being 
projected succesively on the hyperplanes A^^x = b^, in the 
order i=ro,m-.l, . . . »1. This cycle forms a single iteration 
of the algorithm. In fact, if the hyperplanes are mutually 
orthogonal, then a single iteration yields an answer. 
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Based on the above method let us now describe a procedure 
to solve AX+XB=C with A = (a^ -) s M , B = (b - .) <s M , C = (c- .) 

lil 11 ■I’V 

s ^\n,n ^ ^ \i,n* course, we have already seen 

that the equation (4,4.1) can be recast as a linear system 


A+b. T I 
11 m 


bon lyvn 

21 m 


\2\ 


In m 


m 


^2n^ 


^nl^m 

^n2^m 


A+b I , 
nn mj 


xdf 


-(!)' 

^(2) 


C(2) 

• 

4n) 


• 

• 

'fXn) 



^ J 


and 

c(3) 


(4,4.11) 


are the j-th columns of X and C respectively. The matrix in 
the left hand side of (4.4.11) is called the Kronecker sum of 
A and B defined by 


M= + (B^01jj^). (4.4.12) 


Now we shall apply the Kaczmarz algorithm to solve the 
enlarged system and then revert back to the compact form. In 
this procedure we slightly deviate from the ordering used in 
Tanabe’s paper in the sense that we consider the hyperplanes 

in the order corresponding to <^ 21 * * * ' * *^ml*^l2* * * * * * 422 ’ • * “ * 
C-, ,,..,c and in Tanabe' s analysis it is jiist in the reverse 
order. There is no loss of generality in the above assumption 
as now only the first equation is considered first and so on. 

To formulate the algorithm, the only thing required is to 
identify the hyperplane corresponding to Cj_j in the enlarged 
system. The basic step is to write the analogue of (4.4.3) 
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For this, let us define 


“ij “ ^Vi)’Vi)^ + 2 ReCa^^b^J (4.4.13) 


and 


P,.. = (X^^^A*.J) + (b'^’.X*^^) - 0.^. 


11 


(4.4.14) 


To facilitate the presentation, we shall also define the 
following two mappings. Let 


such that 


where 


(i) 

T . j-l,...,n 

T^^\u) = U 


TT^O) , .^(i) - ' J. A 

U = u and U = 0, i / 


(i) 

In other words T ^ (u) is an mx n matrix whose j-th column 
is u and all other columns are zero. Similarly let 


such that 


where 


j \ ^ , . . . ,m 

(i) l,n m,n* 


"(i) 


V(i) = V and = 0, j i. 


Therefore, T(j_)(v) is an mXn matrix whose i-th row is v and 
all other rows are zero. It can be easily seen that the 
analogue of (4.4.3) is 

f (X) = X - d {T^j)(A*,) + T, . (B^j)*)} (4.4.15) 

ij ii ( i) ( i) 


where 


d.. = provided a.. / 0. (4.4,16) 

Xj Xj Xj J-J 
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ot _ — 0 g Isl us "tfiks 
ID 


fi^(X) =X. 


(4.4.17) 


li: may be nobed that if a . .=0, then A and -B have a common 

J-O 

eigenvalue. Thus vanishing of any a - . is an indication for 

-4j 

the system (4.4.1) to have more than one solution or no 
solution. However, vanishing of any aj_j is not a necessary 
condition for the system to be singular. Let 

F(CgX) = fjnn^^m-l,n^‘““^^ln^**'^^ml^^m-l,l*"^^ll^^^^‘*‘^^**'^^ 

...)). (4.4.18) 


Now choosing an initial approximation X^ <= let us 

generate the sequence of matrices by the relation 


X ^ =F(CgX), k=0gl,2g... (4.4.19) 

k+1 k 

Since this sequence can be identified with the sequence of 
vectors in as an immediate consequence of the convergence 

result established by Tanabe [305], it now follows that 

X^ exists and it represents a solution of (4.4.1) in case 
the system is consistent, If we consider the initial 

approximation as X^ = 0, then the above limit gives the minimum 
norm solution. 


Now we shall present the above algorithm suited to 
computer implementation. Here the algorithm is considered 
for the real case. 

Step 1. Choose any X^, for example X^=0, and set X=X^ . 
Step 2. Compute for i=l,...,m 
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Step 3. 


Compute for a=l,,..,n 



n 

I 

t=l 



Step 4. Set k=l. 

Step 5. For 1=1,... do? 

Skip this step for those j and i when 
p +q +2a b = 0 . 

i i ii iO 

m n 


Set (3 


Z 

s=l 


a. X . + Z X . 
IS so t«l it to 



d = l3/(p .+q .+2a. .b . .) 

•^1 ^0 11 31 

For s=l,,,,,m update x . as x .-da. 

so so IS 

For t=l,«..,n update x^^ as 

Step 6. If the matrix X (which is now the k-th iterate Xj^) 

satisfies an acceptance criterion for a convergence, 
go to Step 7. 

Step 7. The process is complete. 


. In the above algorithm, as in the Gauss-Seidel iteration, 
we renovate X successively in each iterative step. Hence it 
requires only one iterate to be kept in memory. However, if 
we consider the stopping criterion for Step 6 as to iterate 
until 

||X^ - < e (4.4.20) 

or 

< e (4.4,21) 

x^ir 
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e > 0 being some prescribed tolerance, then we require mn 
memory locations to keep the previous iterate. Another 
stopping criterion involving two successive iterates is to see 
whether the norm of the residual, i.e, ||C-AXj^-Xj^jl is very close 
to that corresponding to the previous iterate. In case the 
system has a solution, then 

11C-AX^-Xj^b|1 < e (4.4.22) 

can be considered to terminate the iterative process. These 

criteria involving residuals require extra multiplications 

2 2 

compared with (4.4,20). It may be noted that i|xj| = 2 * 

We have followed the stopping criterion as (4.4.20) in our 
numerical examples which will be presented in the end of this 
section. 

We shall now discuss some computational aspects of the 

above algorithm. The algoritlim is very simple to implement 

on a computer. The algorithm does not Involve inversion of 

matrices or transformation of matrices to canonical forms. 

o o 

Calculation of p. and q. values involves m +n multiplications. 

1 D 

In each iteration, there are at most mn steps and each step 
requires 2m+2n+3 multiplications (mHn+2 for p , 1 for d and 
m+n for updating X). Hence each iteration requires (2m+2n+3)mn 
multiplications whereas the Hoskins-Meek- wait on algorithm [205] 
requires ^(m^+n^) + mn{m+n) multiplications per iteration. 

In the present algorithm, calculations in stopping criterion 
involve mn multiplications. Therefore, the total number 
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of multiplicative operations involved in the process is at most 
2 2 

m +n +2mnk(m+n+2) , k being the number of iterations required. 
For sparse matrices, this figure is reduced considerably. 

It can be easily seen that the storage requirement for the 
proposed algorithm is m^+n^+2mn+m+n. It also requires 
additional mn locations for the previous iterate involved in the 
stopping criterion. If we do not count these mn locations 
then definitely for large systems this method requires smaller 
number of memory locations compared to the Bart els -Stewart 
algo'rithm and the Hoskins-Meek-Walton algorithm which require 
2m^+2n^+mn and (m+n) ^■hnax(m^,n^) locations respectively. Even 
if we take the storage requirement for the present algorithm as 
m^+n2+3mn+ra+ni, we are in a favourable situation in most cases 
(that is if m/n is different from 1 and at least one of m and 
n is sufficiently larg^, for example m=2n or n=2m. 

In conclusion, the Kaczmarz projection method proposed 
in this section is applicable to all classes of A and B. If 
is not close to zero then immediately we can infer 
that the system is inconsistent. Moreover this method seems 
less vulnerable to the growth of round-off errors. On the 
other hand, the convergence of the method is generally very 
slow compared to the Hoskins-Meek-Walton algorithm. Our 
computational experiments show that in some cases the total 
number of iterations required for convergence even to three 
or four decimeO. places is several hundred with each A and B 
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of size 10x10. However, there are systems which take only 
a small number of iterations. Some such examples follow. 

In these examples we give A, B and C and also the actual 
solution. To conserve space, the iterates are given only 
for selected values k. In all these examples X^ = 0 and the 
stopping criterion is taken as ii ^ ^ with e = 10 . 

EXAMPLE 4.4.1. A = [J 0] , B = [“2 -1] , c = [3 3] 

The system AX+XB=C is singular and consistent and therefore 

has infinitely many solutions. 

X = f -0.599999 1.200000 - 
^ 0,300000 0.100000 



-0.599999 1.200000 
2.991783 0.997261 


and 11X^5 jp = 11.7453 


From the output it is 


the solution 


- 0.6 


observed that the method converges to 

1 . 2 “! 


3.0 1.0 


EXAI'IPLE 4.4.2. We consider A and B as in the above 
example but with C = [^ so that the system AX+XB=C is 
inconsistent. 

We find that X^^ is approaching 


0.4 -0.8] 
3.0 l.ol 


When k=4l, X^ - 


0.400000 -0.800000 
2.960092 0.986697. 


Also llX f = 10,5357. 
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The above examples sh^w that the convergence is very- 
slow even for 2X2 system. As pointed out earlier, iterative 
raethods are usually api:licd to large linear systems with a 
sparse coefficient matrix. Now we shall present some examples 
involving lOxlO matrices. 


EXAMPLE 4.4,3. 


1 

3 

0 

6 

7 

0 

6 

0 8 

5 

' 

2 

0 

7 

2 

5 

3 

4 

1 

9 

fl 

9 

7 

2 

6 

3 

3 

0 

2 0 

7 


6 

2 

4 

9 

5 

1 

0 

3 

5 

5 

6 

0 

7 

8 

1 

0 

5 

1 4 

1 


5 

1 

8 

1 

7 

4 

0 

1 

2 

0 

2 

1 

4 

5 

6 

8 

1 

7 8 

0 


7 

2 

5 

3 

4 

6 

0 

2 

2 

6 

1 

2 

8 

2 

5 

5 

7 

1 9 

9 

’R - 

1 

4 

6 

5 

8 

0 

0 

2 

9 

6 

2 

3 

3 

4 

6 

8 

9 

4 2 

0 


2 

9 

6 

3 

8 

4 

0 

3 

1 

1 

5 

5 

1 

1 

7 

2 

4 

2 8 

6! 


6 

9 

2 

5 

1 

4 

0 

2 

2 

5 

8 

9 

8 

0 

1 

2 

8 

2 1 

2 


0 

1 

8 

0 

5 

2 

2 

2 

1 

7 

0 

0 

3 

2 

0 

5 

8 

8 1 

3 

i 

5 

3 

7 

1 

1 

7 

9 

7 

6 

2 

3 

7 

4 

3 

1 

3 

4 

5 1 

3 


5 

5 

p; 

5 

0 

3 

1 

5 

5 

9 


C is cxhosen as AJ+JB where J is the matrix of unities, 
i.e., the matrix whose elements are all 1. 


After 36 iterations, ';he termination criterion is 
satisfied, i.e. “ ^35^^ < 10 , We find that 

= 100.00396, The matrix given below is when k=36. 

(We give only upto three decimal places) 


1.000 

0.998 

1.003 

0.999 

0.996 

0.995 

1.002 

1.004 
1.002 
0.998 


1.010 0.996 1.007 
0.999 1.005 0.994 

1.000 1.003 1.002 
0.999 1.003 0.990 
0,995 0.992 1.011 
0.989 1.003 1.002 

1.006 0.999 1.002 
0,998 0.992 1.005 
1,002 1.008 0.988 
1.000 1.000 1.000 


1.004 
1.000 
1,012 
0.991 

1.009 

1.005 
0.996 
1.000 
0.990 
0.997 


1.004 

0.999 

1 

.001 

0.991 

0,996 

0.998 

1 

.000 

1.003 

0.998 

0.993 

0. 

,990 

0.993 

1.006 

1.012 

1 

.005 

0.997 

1.002 

0.996 

0 

.989 

1.002 

0.997 

1.002 

1 

.000 

1.003 

1.001 

1.001 

1 

.005 

0.997 

0.996 

0.993 

0 

.996 

l.OlO 

1.001 

1.009 

1 

.011 

1.000 

1.001 

1.000 

0 

.997 

1.003 


1.007 

X.OO5 

1.001 

1.000 

1.002 

0.997 

1.000 

0.998 

1.000 
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EXAMPLE 4.4.4, Let A = (aj[Q) 4e defined by 

={i 

0 otherwise. 

By considering B as A and C as AJ+JB where J is the matrix of 
unities we find that the iteration terminates in 21 steps . 

99.8269 and I1X2^-X2 q II = 1.3 x 10“^. 

Finally let us consider a system AX+XB=C whose enlarged 
system is tridiagonal. 

EXAMPLE 4.4,5. Bet A be defined as in the preceding 
example. Let B = diag(l,2 , , . . ,10) and C= AJ+JB where J is 
the matrix of unities. In this case we find that ||x [f- = 
99,998758 and all the elements in X^, except the first three 
elements in the first column, are either 0.99999 or 1,00000. 
The first three elements in the first column are respectively 

0.99879, 1.00QB2 and I.OOOO 7 . 

In the light of above examples, we recommend that the 
Kaczmarz projection method as explained in this section 
maybe used to solve large sparse systems, 

4,5. The Residual Projection Method for Solving AX+XB=C 

In this section, we formulate the residual projection 
method for solving AX+XB=C with A, B, C, X as assumed in the 
preceding section. The key te this method is once again to 
consider the enlarged system (4.4.11) and then to apply the 
residual projection method analysed in Rathore [301] for 



208 


solving linear systems and then to rewrite the computations 
in the compact form suited to the original matrix equation. 

To begin with let us briefly outline with some 

simplifications, the residual projection method as given in 

Rathore [30l] for solving Ax = b, A s M . (The method 

rn j ri 

as considered in [30l] incorporates a general inner product on 
E™ whereas in the following it has been specialised to the 

usual one, i.e., (u,v) = v*u.) 


The residual projection algorithm generates a sequence 
of vectors {x^}in which the j-th component x^ of x is 
computed through 



J J 

(4.5.1) 


j 

k+1 k k+1 

(4.5.2) 


(4.5.3) 

and 


(4.5.4) 

where 

k+1 k+1 , 0 k,n if—Q 1 o 

= I* = r , K— O , X , ( • • 

(4.5.5) 

and 

0,0 o , . 0 

r = r = b-Ax 

(4.5.6) 


It has been established in [30l] that the vector sequences 
and {x^} thus generated converge for arbitrary A, b and x° 

^°°^llm = p b (4.5.7) 

k K 

x~ = lim x^ = + Gb 

k “ 


and 


(4.5.8) 
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where is the orthogonal projection onto Ker A*, P is the 

projection onto Ker A along Im T, with T being the conjugate 

transpose of the rax n matrix whose j-th colimin is S. A^^^/a » 

j""! j 

j 1 y • * • ^n« here 


S. = RfR^...R^ , S = I , S = S* 
3 12 j o m’ n 

R. = I - 

0 m 0 


(4.5.9) 

(4.5.10) 


In (4.5.8) » G = T(I -SPj) with Pj denoting the orthogonal 

projection 

onto Im A. Here the invertibility of I^-SPj is guaranteed 
through the theory developed in [301]. It has also been 
characterized in [50l] that x° given by (4.5.8) minimizes 
l|b-Ax|i. 


Now We shall apply the above method to the enlarged system 
(4.4,11) and rewrite the calculations suited to the original form 
(4.4.1) . The algorithm can be described as follows. Let 

a.. = (A^^^A^^b + (B*. ) + 2 Re(5..b..) 

13 (3) (3) 11 33 ( 4 . 5 . 11 ) 

! ^ij " + ^^(i)'®('j)^ (4.5.12) 

where R is the updated residual C-AX-XB. Let 

d.. = { “id ^ (4.5.13) 

^d 0 if a • . = 0 . 

X J 

Then update Xi^ by adding di^ to the existing 3^ and update 

the residual by R-d^JT (A^^h (B(-i))}» R being the 

existing residual and T^3/ mappings defined as 

in the previous section. If we complete these steps for 
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3=15. ♦.»np i=l5,.,,m then we say one iteration is over. We 
continue the process until the required accuracy is achieved. 


The step-by-step procedure of the above algorithm is 
presented below for the real case. 

Step 1. Choose any for example X ^=0 and set X=X^. 
Step 2 . Set R = (r. .) = C-AX-XB 
Step 3 . Compute for i«l , . . . ,m 

IQ 9 

P. 

5 —X o X 

Step 4 . Compute for 

n 2 
q = 2 

3 t=l 3 "^ 

Step 5 • Set k=l . 

Step 6 . F or 3 ~i 5 ... 9 9 i— 1 9 . • .9m do . 

Skip this step for those 3 sold i when 

p +q + 2 a . . b . . = 0 . 

^1 ^3 11 33 

S®* P = Jl * Jl 

Update X. , as x. .+d. 

13 13 

For s=l,...,m update r as r -da 

SJ SJ bX 


For t=l,...,n update r^_^ as r^_^-db^_^. 

Step 7. Tf the matrix X (which is now the k-th iterate Xy) 
or the matrix R (which is now the k-th iterate Rj^.) 
satisfies an acceptance criterion for a convergence 
go to Step 8 . 



EXAMPLE 4.5.1. 


Consider A, B, C as in Example 4.4.1. 

By the residual projection method we find that converges 

r-3 On . 

■to L 2 in 22 iterations. It may be observed that this is 
not a minimum norm solution. 


EXAI'IPLE 4.5 »2. Consider A, B, C as in Example 4.4,2. 

We find that 11^^11 converges to 12.5 and in this case 
converges to ^°o^' ^56 given as 

lO. 5000000 0,0000000' 

_2,9999923 0,9999982^1 

with IKcf = 10.249951. 

5b 

For the matrices A, B and C as given in Example 4.4.3 

we executed the residual projection algorithm. We noted that 
the number of iterations exceeded eight thousands whereas 
by the Kaczmarz projection method, the solution was obtained in 
36 iterations. 


EXAIVIPLE 4.5.3. Consider A, B, C as in Example 4.4.4. 


After 91 iterations, the termination criterion is satisfied, 


i.e., ||C-AZg^-X^B 1^ < 10 -S. 


X is given below upto three 
91 


decimal places. 

1.001 0.999 1.000 
0.999 1.001 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1,000 1.000 1.000 

1.000 1.000 1,000 


1.000 0.996 1.020 

1.000 1.001 0.995 

1.000 1.000 1.001 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1,000 1.000 1.000 


0.869 1.015 0.998 
1.015 0.997 1.000 
0.998 1.000 1.000 

1.000 1.000 1.000 

1.000 1.000 1,000 

1.000 1.000 1.000 

1.000 1.000 1.000 

1.000 1,000 1.000 

1.000 1.000 1.000 

1.000 1.000 1.000 


1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1,000 

1,000 
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example 4.5,4. For the matrices A, B, C as given in 
Example 4,4,5? we find by the residual projection method 
that becomes less than 10 in 15 iterations. 
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